-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow FP16 or other precision inference for Pipelines #31342
Conversation
…her precision in pipelines
Anyone know how do I fix the code quality errors from here instead of running ruff locally? I don't have one setup now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature!
You should be able to directly use .to
on the image processor outputs.
All of the pipelines should have tests added to check they can accept and run with fp16 inputs
Co-authored-by: amyeroberts <[email protected]>
And for tests, do you think I can just add a |
Certainly not 😄 fp32 is the default for pipelines and so this should be tested by default. We'll need to add tests which set |
Sure, i'll add that |
@amyeroberts do I need to test for numerical similarity in fp16 or just make sure the inference runs? Testing for numerical is quite some work and slow to run as I have to download each model in each pipeline (i changed 14 of them) then get the expected logits then check |
I have pushed the tests for inference and not check for numerical stability by using the existing code in pipeline test mixin, except for a few models where no mixin common methods (
|
@aliencaocao Great! At the moment, there's a few pipeline tests failing which will need to be resolved. To run slow tests locally, you can set the |
yes i will be resolving them but i see 1 error with |
@aliencaocao For this test, as it's testing the pipeline rather than the model itself, we can change the checkpoint/architecture used. It will likely need other values to be updated alongside. |
The
relative_coords_table = relative_coords_table.to(next(self.continuous_position_bias_mlp.parameters()).dtype) As Should I make a new PR for this change or add it here? |
Could you do this in a separate PR please? It'll be easier to track this way |
PR made #31589 |
4 other failed tests can be fixed by #31590 - a small QoL improvement |
For the failing
Should we correct the impl? I can do a check on other models for similar issues. The fix should be quite simple, just |
@aliencaocao Yes, let's use
Here does "all models" refer to all owlvit checkpoints? If that's the case, then that's OK! |
What I meant was all torch impl of models in HF transformers as more may be using a hard-coded out of range value for masking logits like Existing tests already check for logits and we will know if anything gets affected. Ideally, none of them should if the masking is working as intended. Do you want a new PR to update the code for |
Up to you. Having it correct across all models if of course the dream, but it can be a bit laborious making sure this is correct everywhere + tests and might not be worth it for low-use models. I'm happy to just have the change made for owlvit here, and then we can think about other models' compatibility with fp16 if users raise it |
# Conflicts: # tests/pipelines/test_pipelines_feature_extraction.py # tests/pipelines/test_pipelines_zero_shot_audio_classification.py
@amyeroberts the failing tf and onnx tests are due to some keras changes in https://github.com/keras-team/keras/releases/tag/v3.4.1 The failing torch pipeline test is due to network timeout |
@aliencaocao Yes, unfortunately the keras update has broken everything 😭 We're working on a fix. I'll ping once resolved and hopefully then we can successfully re-run the CI for this PR |
@aliencaocao There's been fixes for keras and some of the timeout errors. Could you rebase to include these - should then make all the CIs green |
@amyeroberts CI all green now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work - thanks for adding!
What does this PR do?
Currently, if you pass
torch_dtype=torch.float16
or setmodel=AutoModel.from_pretrained(..., torch_dtype=torch.float16)
in hope to use FP16 for inference in aPipeline
, it will fail because although the model is casted to FP16, the inputs like image features stays in fp32 as the default torch dtype.This PR converts them accordingly. It only convert those that comes out of aimage_processor
and with type torch.float32 so to not accidently touch things like token ids or boxes which may be in torch.int by intention.I have not checked pipelines involving audio inputs but I would imagine some of them also having the same issue.I originally found this issue when using
ZeroShotImageClassificationPipeline
like this:Note that I have yet to write tests for it as I want to make sure this is a valid issue and I am not just using Pipelines wrongly.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@Narsil