Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate using exported model and enable gemma2-2b in ExecuTorch #33707

Merged
merged 5 commits into from
Oct 11, 2024

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Sep 26, 2024

What does this PR do?

Adding generate support for exported model.
Adding gemma2-2b to ExecuTorch with tests.
Adding an integration test for gemma-2b that we've enabled already.

Additional Test in ExecuTorch

Running gemma2-2b E2E:

cmake-out/examples/models/llama2/llama_main --tokenizer_path=tokenizer_gemma2.bin --model_path=gemma2.pte --prompt="My name is"
I 00:00:00.001356 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.001425 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.001431 executorch:cpuinfo_utils.cpp:158] Number of efficient cores 4
I 00:00:00.001434 executorch:main.cpp:65] Resetting threadpool with num threads = 6
I 00:00:00.005564 executorch:runner.cpp:65] Creating LLaMa runner: model_path=gemma2.pte, tokenizer_path=tokenizer_gemma2.bin
E 00:00:03.701808 executorch:tiktoken.cpp:79] invalid tiktoken line:
I 00:00:03.701845 executorch:runner.cpp:88] Failed to load tokenizer_gemma2.bin as a Tiktoken artifact, trying BPE tokenizer
I 00:00:03.767485 executorch:runner.cpp:94] Reading metadata from model
I 00:00:03.767512 executorch:runner.cpp:119] Metadata: get_vocab_size = 256000
I 00:00:03.767515 executorch:runner.cpp:119] Metadata: get_bos_id = 2
I 00:00:03.767517 executorch:runner.cpp:117] Methond use_sdpa_with_kv_cache not found, using the default value 0
I 00:00:03.767518 executorch:runner.cpp:119] Metadata: use_sdpa_with_kv_cache = 0
I 00:00:03.767520 executorch:runner.cpp:119] Metadata: get_n_eos = 1
I 00:00:03.767521 executorch:runner.cpp:117] Methond append_eos_to_prompt not found, using the default value 0
I 00:00:03.767522 executorch:runner.cpp:119] Metadata: append_eos_to_prompt = 0
I 00:00:03.767524 executorch:runner.cpp:119] Metadata: get_max_seq_len = 123
I 00:00:03.767525 executorch:runner.cpp:117] Methond enable_dynamic_shape not found, using the default value 0
I 00:00:03.767527 executorch:runner.cpp:119] Metadata: enable_dynamic_shape = 0
I 00:00:03.767529 executorch:runner.cpp:119] Metadata: use_kv_cache = 1
I 00:00:03.767575 executorch:runner.cpp:119] Metadata: get_n_bos = 1
I 00:00:03.767604 executorch:runner.cpp:167] RSS after loading model: 0.000000 MiB (0 if unsupported)
I 00:00:04.408489 executorch:runner.cpp:234] RSS after prompt prefill: 0.000000 MiB (0 if unsupported)
My name is Lale and I love to play with my dolls. I started to play with dolls at the age of two years old. My favorite activity is dancing. I would like to help people. I would like to travel to Spain. I like to be a vet. I love to help people. I would like to travel. I would like to be. I love to travel. I would like to be. I like to travel. I would like to be. I love to travel. I would like to be. I love to travel. I would like to be. I love to travel
I 00:00:23.188418 executorch:runner.cpp:246] RSS after finishing text generation: 0.000000 MiB (0 if unsupported)
PyTorchObserver {"prompt_tokens":4,"generated_tokens":118,"model_load_start_ms":1727310065089,"model_load_end_ms":1727310068851,"inference_start_ms":1727310068851,"inference_end_ms":1727310088272,"prompt_eval_end_ms":1727310069492,"first_token_ms":1727310069492,"aggregate_sampling_time_ms":180,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:23.188436 executorch:stats.h:84] 	Prompt Tokens: 4    Generated Tokens: 118
I 00:00:23.188438 executorch:stats.h:90] 	Model Load Time:		3.762000 (seconds)
I 00:00:23.188440 executorch:stats.h:100] 	Total inference time:		19.421000 (seconds)		 Rate: 	6.075897 (tokens/second)
I 00:00:23.188442 executorch:stats.h:108] 		Prompt evaluation:	0.641000 (seconds)		 Rate: 	6.240250 (tokens/second)
I 00:00:23.188444 executorch:stats.h:119] 		Generated 118 tokens:	18.780000 (seconds)		 Rate: 	6.283280 (tokens/second)
I 00:00:23.188446 executorch:stats.h:127] 	Time to first generated token:	0.641000 (seconds)
I 00:00:23.188480 executorch:stats.h:134] 	Sampling time over 122 tokens:	0.180000 (seconds)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker
@gante
@amyeroberts
@qubvel

@guangy10 guangy10 marked this pull request as ready for review September 26, 2024 01:16
@guangy10 guangy10 mentioned this pull request Sep 26, 2024
26 tasks
@guangy10
Copy link
Contributor Author

Why am I start getting fetch_tests - Unauthorized error on all the PRs I'm creating?

@LysandreJik
Copy link
Member

This seems to be happening more often, @ydshieh would you know what might be happening? Otherwise I'll reach out to the CircleCI team, this is hindering work on the repo

@LysandreJik
Copy link
Member

I've asked the CircleCI team @guangy10, very sorry for the inconvenience.

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 27, 2024

Might be related to

Allow CI could be run on private forked repositories (e.g. new model additions) (#33594)

But not happening on all external contributor's PRs. Strange

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 27, 2024

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 27, 2024

Another thing to check

If you're following the fork instead of the upstream repo

A user who submits a pull request to your repository from a fork, but no pipeline is triggered with the pull request. This can happen when the user is following the project fork on their personal account rather than the project itself on CircleCI.

This will cause the jobs to trigger under the user's personal account. If the user is following a fork of the repository on CircleCI, we will only build on that fork and not the parent, so the parent’s PR will not get status updates. 

In these cases, the user unfollows their fork of the project on CircleCI. This will trigger their jobs to run under the organization when they submit pull requests. Those users can optionally follow the source project if they wish to see the pipelines.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@guangy10
Copy link
Contributor Author

guangy10 commented Sep 27, 2024

Could you first try ..?

https://support.circleci.com/hc/en-us/articles/360048210711-How-to-Refresh-User-Permissions

Weird. Used to work fine on my old PRs. Tried "Refresh Permission". Let's see if it can be unblocked.

@ydshieh It's getting worse after re-fresh the permissions. check_circleci_user starts failing.

@guangy10
Copy link
Contributor Author

Another thing to check

If you're following the fork instead of the upstream repo

A user who submits a pull request to your repository from a fork, but no pipeline is triggered with the pull request. This can happen when the user is following the project fork on their personal account rather than the project itself on CircleCI.

This will cause the jobs to trigger under the user's personal account. If the user is following a fork of the repository on CircleCI, we will only build on that fork and not the parent, so the parent’s PR will not get status updates. 

In these cases, the user unfollows their fork of the project on CircleCI. This will trigger their jobs to run under the organization when they submit pull requests. Those users can optionally follow the source project if they wish to see the pipelines.

"In these cases, the user unfollows their fork of the project on CircleCI. " I have no idea how to unfollow my fork on CircleCI. I don't even see if I'm following transformers on CircleCI. I created a CircleCI account with exact email and linked to my github, and I can't see any project I'm following..

@guangy10
Copy link
Contributor Author

https://support.circleci.com/hc/en-us/articles/360008097173-Troubleshooting-why-pull-requests-are-not-triggering-jobs-on-my-organization

so please be sure "Build forked pull requests" is enabled in Project Settings > Advanced

@ydshieh Their Wiki is so bad. I can't find this setting from anywhere.

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 28, 2024

Hmm, weird CircleCI issue ...

Could you check a last time:

top-left (like in the image): organization : switch to your own org, follow transformers there.
Optionally: Then switch to huggingface (if you can find it) and follow transformers there

If still not working, I could try to push a commit.

Screenshot 2024-09-28 115104

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 30, 2024

I push a commit to trigger the CI jobs. It runs now. Got to say I have no idea what is wrong on the CircleCI side with the permission issue as other external contributors' PRs don't have it.

@guangy10
Copy link
Contributor Author

Hi @ydshieh thank you so much for debugging this issue together with me. Per the message from CircleCI support team, it seems like the TRANSFORMERS_CONTEXT in the .circleci/config.yml is causing the permission issue, and I was suggested to remove it from the config.yml. It seems like it's newly added in #33594 last week. Still I'm not sure why it only affects my PRs but works for other users, but hope this message can give you more pointer to help resolve this issue on my fork and PRs. Thank you.

Screenshot 2024-09-30 at 11 42 35 AM

@guangy10
Copy link
Contributor Author

Since @ydshieh pushed a commit to trigger the CI and all CIs are green. Can I get a review on this PR? @amyeroberts @qubvel

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🤗 asked a question for general direction as I am wondering if there is a way for us to improve our generate make it a bit more compatible !

src/transformers/integrations/executorch.py Show resolved Hide resolved
tests/models/gemma2/test_modeling_gemma2.py Outdated Show resolved Hide resolved
@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

Hi @guangy10 Thank you for the message and contacting CircleCI team on your side too. I also think it is from #33594 from the beginning of seeing this issue. But as you mentioned, only a few (external) contributors face this issue, and the answer from CircleCI attached above is somehow confusing (i.e. it doesn't explain what actually causes it).

I will create a new personal github account and see if I can reproduce and come up with a solution.

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

Well I am able to reproduce now

#33850

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

Opened a PR #33866 for this CircleCI issue

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 1, 2024

Opened a PR #33866 for this CircleCI issue

@ydshieh Thanks a lot. Really appreciate the quick fix! Let me rebase on top of the fix

@HuggingFaceDocBuilderDev

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

  • Add the run-slow label to the PR
  • When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
    • If the pull request affects a lot of models, put at most 10 models in the commit message
  • A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 1, 2024

  • Add the run-slow label to the PR

@ydshieh I don't see there is a way for me to add run-slow label to the PR. Is there one?

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

Are you able to see what is shown below?

Screenshot 2024-10-01 200227

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 1, 2024

Are you able to see what is shown below?

Screenshot 2024-10-01 200227

No, that's why I'm asking if the message is asking me to do so or the repo maintainers. Also wondering if we have a something like pytorchbot that can do labeling for any user w/o having write permission to the repo.

Screenshot 2024-10-01 at 11 29 21 AM

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 1, 2024

Thanks for the feedback. We will improve the stuffs. For now I just added it manually here.

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 2, 2024

@ArthurZucker @qubvel could you help review for this PR? Once it's merged we can add such integration tests for all executorch-compatible models, not only test the exportability but also the generate.

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2024

Don't forget to push an (empty) commit with message: [run_slow] gemma, gemma2, thank you

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🤗

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved
src/transformers/integrations/executorch.py Show resolved Hide resolved
tests/models/gemma/test_modeling_gemma.py Show resolved Hide resolved
@guangy10
Copy link
Contributor Author

guangy10 commented Oct 3, 2024

Comments addressed.
Does it require a 2nd reviewer in order to merge?

@ArthurZucker
Copy link
Collaborator

Nope, just waiting for the CIs right now!


tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", pad_token="</s>", padding_side="right")
EXPECTED_TEXT_COMPLETION = [
"Hello I am doing a project on the 1990s and I need to know what the most popular music was in the 1990s. I have looked on the internet and I have found a few things but I need more. I have found that the most popular music was rap",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker I see a test failure may be related to this. I copied this over from another model with additional texts added, feel free to truncate it to a shorter message and push a new commit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah won't have time to update them tonight! But yeah we can merge otherwise!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker Let me know if there is anything I can do to merge this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry lost track was wondering if you could update the expected values, ortherwise we can merge and @ydshieh will take care of them!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this afternoon and merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share your torch version ..?

Mine is a dev version 2.6.0.dev20241007. What is the torch version where the test fails?

attn_implementation = None

It doesn't look correct. If we set it to None, what is the default attention being used? If I recall correctly, SPDA is the only attention impl that supports StaticCache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, torch version seem to be the issue. Let me dig into it and will update here shortly

Copy link
Contributor Author

@guangy10 guangy10 Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydshieh Okay, I can confirm that exporting gemma2 model will require torch==2.5.0 in order to work correctly.
I also verified that running the test on torch==2.4.1 or torch==2.0.0 will get the original prompt as output.

Here are the detailed package info:

pip list | grep torch
executorch                         0.5.0a0+f8cec53
executorchcoreml                   0.0.1
torch                              2.5.0
torchaudio                         2.5.0
torchvision                        0.20.0

RUN_SLOW=1 pytest tests/models/gemma2/test_modeling_gemma2.py -k test_export_static_cache -v

=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.10.13, pytest-7.2.0, pluggy-1.0.0 -- /Users/guangyang/miniconda3/envs/executorch/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/guangyang/transformers/.hypothesis/examples')
rootdir: /Users/guangyang/transformers, configfile: pyproject.toml
plugins: cov-4.1.0, anyio-4.4.0, xdist-3.3.1, hypothesis-6.84.2
collected 389 items / 388 deselected / 1 selected

tests/models/gemma2/test_modeling_gemma2.py::Gemma2IntegrationTest::test_export_static_cache PASSED                                                                                                 [100%]

BTW, the model do require torchvision >= 0.19.0, will fail with lower version. It's already implied by requiring torch>=2.5.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to bump up the required torch version to 2.5.0.

torch==2.5.0 is not available publicly until Oct 17, 2024. We can either merge this PR as-is, the test_export_static_cache for gemma2 will be skipped in the CI until torch 2.5.0 is available in transformers. Or we can defer gemma2 enablement until torch 2.5.0 is available in transformers by splitting the gemma2 related code out of this PR and merge the rest. Personally I would prefer former, but let me know how you want to proceed. cc: @ArthurZucker @ydshieh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! We are still using python 3.8 and torch 2.4.1. I plan to switch to python 3.9 at the end of October. Before that, python 2.5 won't be available in the system. We will think about that.

We can merge this PR as it is. Thank you for checking (it works indeed).

If we set it to None, what is the default attention being used?

Just the native implementation of attention (in the modeling files) using torch operators.

If I recall correctly, SPDA is the only attention impl that supports StaticCache.

The native implementation of attention also works with StaticCache I believe.

@guangy10
Copy link
Contributor Author

Debug the gemma2 export issue and update the required torch version for it.

@ydshieh ydshieh merged commit 7d97cca into huggingface:main Oct 11, 2024
17 of 20 checks passed
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024
…gingface#33707)

* Generate using exported model and enable gemma2-2b in ExecuTorch

* [run_slow] gemma, gemma2

* truncate expected output message

* Bump required torch version to support gemma2 export

* [run_slow] gemma, gemma2

---------

Co-authored-by: Guang Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants