Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracker: generate compatibility with torch.compile #28981

Open
21 of 33 tasks
gante opened this issue Feb 12, 2024 · 15 comments
Open
21 of 33 tasks

tracker: generate compatibility with torch.compile #28981

gante opened this issue Feb 12, 2024 · 15 comments
Assignees
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@gante
Copy link
Member

gante commented Feb 12, 2024

generate 🤜 🤛 torch.compile

Part of the PyTorch 2024 H2 roadmap.

This issue is a tracker of the compatibility between .generate and torch.compile (intro docs by pytorch). The goal is to enable fullgraph=True compilation on the main generate use cases.

⚠️ Is your generate use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗

Decoding Strategies (end-to-end compilation)

  • greedy_search / sample are compatible (Generate: end-to-end compilation #30788)
  • beam_search / beam_sample are compatible, depends on the step above
  • assisted_decoding (aka speculative decoding) is compatible, depends on the steps above

Generate Flags and Options

  • all LogitsProcessor classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)
  • all StoppingCriteria classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)

Models

Notes:

  1. models tagged as "important models" in our CI + popular models
  2. language models released starting from v4.42 should ALL support compile

Decoder-only:

Encoder-decoder:

Quantization

  • BNB support
  • GPTQ support
  • AWQ support

Others

  • We have a benchmark script to quickly compare the impact of PRs
  • Add section to existing docs on the topic
  • Confirm that pipelines work after compiling generate
@gante gante changed the title generate compatibility with torch.compile tracker: generate compatibility with torch.compile Feb 12, 2024
@gante gante self-assigned this Feb 12, 2024
@gante gante mentioned this issue Mar 14, 2024
26 tasks
@huggingface huggingface deleted a comment from github-actions bot Mar 25, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante gante reopened this May 2, 2024
@ArthurZucker ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label May 9, 2024
@kadirnar
Copy link
Contributor

👀

@guangy10
Copy link
Contributor

👀

1 similar comment
@jiqing-feng
Copy link
Contributor

👀

@mlazos
Copy link

mlazos commented Aug 5, 2024

Hi @gante, I'm planning on looking at these issues from the torch.compile side - two questions: (1) is there someone from HF that's committed to working on this in the event I identify some model changes that would help run torch.compile more smoothly? and (2) are these the right models we should be focusing on? are there others that are also in need of torch.compile support?

Let me know when you can, also happy to chat on slack (I'm on the HF slack already)

@fzyzcjy
Copy link
Contributor

fzyzcjy commented Sep 27, 2024

Hi, is there any updates? I am especially interested in the things related to #30647. Thanks!

@vbonnivardprobayes
Copy link

vbonnivardprobayes commented Oct 15, 2024

Hi! Do you plan on adding DONUT too? It would be highly appreciated :) As its decoder is a BART one, do you plan on doing the two alltogether for instance?
Thanks!

@gante
Copy link
Member Author

gante commented Oct 17, 2024

@vbonnivardprobayes T5 compatibility is close to being done (#34089), BART and related models will get the changes next :) (cc @zucchini-nlp )

@gante
Copy link
Member Author

gante commented Oct 17, 2024

@fzyzcjy Not on beam search. I'll be separating prefil into a separate function as my next task, then I'll probably work on vectorized beam search :)

(see other tracker: #30810)

@tsengalb99
Copy link

Are there any plans to add cuda graph support for models that are partitioned over multiple GPUs?

@gante
Copy link
Member Author

gante commented Oct 29, 2024

@tsengalb99 if that's possible then yes :D (multi-device is not my speciality, cc @SunMarc )

@SunMarc
Copy link
Member

SunMarc commented Nov 4, 2024

Hey, @tsengalb99 , we are integrating TP with transformers and it is also compatible with torch.compile . Could you confirm that it is comptible with cuda graph @kwen2501 ? Otherwise, PP should work with torch.compile cuda graph also with the latest pytorch 2.5

@joanvelja
Copy link

@SunMarc so this applies to models scattered across multiple GPUs with DeepSpeed via Accelerate too?

@anijain2305
Copy link
Contributor

@gante Can you update the description to reflect any new model compile support?

@zucchini-nlp
Copy link
Member

I updated the models which have compile compatibilty already and added a few more links for open PRs. This list is not comprehensive and shows commonly used models. At this moment all decoder-only models should be compile friendly, and the encoder-decoder ones will come in the next batch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests