-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Quantization workflow #29046
[CI] Quantization workflow #29046
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much ! Can you try to trigger a run of the quantization slow tests and make sure we get the slack notifications (after building the quantization docker image) ? 🙏
For that you can comment out all jobs except the quantization tests job in self-scheduled.yml
& build-docker.yml
and modify the run condition on on push to the name of the branch you are working on right now (there might be a better way to trigger only the quantization tests but all tests are on the same workflow file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment ! (this can be applied to all our workflow file that used that logic for checking out to the current main branch)
I see that Edit: I tried to install python 3.10 but it didn't work ( |
The only reason I was able to run |
@SunMarc thanks! |
Yes that would be for the best ! I prefer to keep running the quantization tests with python 3.8 since this is what we are actually doing for all transformers tests ! Moreover, it will be better for the users since we keep the requirement low. LMK when it is done ! @BlackSamorez |
Perfect ! I will wait for your PR to be merged then + release then if it doesn't take too much time. Please keep me updated ! Otherwise, I can first merge this PR without aqlm and add it afterwards. |
@SunMarc |
I was able to build the image but I don't have the permission to push the image cc @ydshieh
|
@SunMarc Do you builded/pushed via transformers' github actions? If so, do you have a job run link? |
Yes ! Here's the link to the job |
it's quite strange because the workflow was indeed able to login: https://github.com/huggingface/transformers/actions/runs/7979164961/job/21802381289#step:5:1 but fails to push ... |
In TRL and PEFT I can confirm the build & push works fine : https://github.com/huggingface/peft/actions/workflows/build_docker_images.yml / https://github.com/huggingface/trl/actions/workflows/docker-build.yml so it's not a token issue as we use the same except if the token has expired for transformers cc @glegendre01 |
Hi @SunMarc it's because there is some change in the infra team on docker hub stuff. The repository I will ask for it. |
BTW, I will review this PR tomorrow or Friday 🙏 |
Co-authored-by: Younes Belkada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @SunMarc (and @younesbelkada ) for this initiative.
I love this idea - quantization tests become more and more, and failing on them becomes hard to track in the whole daily CI.
I have some suggestions and questions for the current version though.
RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.8/autoawq-0.1.8+cu118-cp38-cp38-linux_x86_64.whl | ||
|
||
# For bettertransformer + gptq | ||
# For bettertransformer | ||
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we count this as non quantization
tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep that as optimum is a hard requirement to use GPTQ
RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile | ||
RUN echo torch=$VERSION | ||
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build. | ||
# Currently, let's just use their latest releases (when `torch` is installed with a release version) | ||
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure at the end, we do get torch 2.2
. See my comment in the change in #29208. Otherwise, move this after ./transformers[dev]
step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ! I didn't have to move it after ./transformers[dev]
step.
|
||
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev] | ||
|
||
RUN python3 -m pip uninstall -y flax jax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also uninstall tensorflow. Or just use ./transformers[torch]
above (but we might need to install other stuffs manually, not sure)
(not a big deal, we can keep it if you are busy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to transformers[dev-torch] !
latest-quantization-torch-docker: | ||
name: "Latest Pytorch + Quantization [dev]" | ||
# Push CI doesn't need this image | ||
if: inputs.image_postfix != '-push-ci' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to know if we intent to run quantization
tests in a daily basis, or if we prefer to run them on each commit merged into main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is better to run them on a daily basis. There are many slows tests too + I think that the quantization tests are not easily broken by changes in transformers. The bigger issue is the third party libraries that might introduce breaking changes. cc @younesbelkada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes daily basis sounds great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
run_tests_quantization_torch_gpu: | ||
name: Quantization tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we intend to keep this in this workflow file. Isn't the goal to move this outside to a separate workflow file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it there for now and I'll move it outside in my PR
@@ -1043,6 +1043,7 @@ def prepare_reports(title, header, reports, to_truncate=True): | |||
"PyTorch pipelines": "run_tests_torch_pipeline_gpu", | |||
"TensorFlow pipelines": "run_tests_tf_pipeline_gpu", | |||
"Torch CUDA extension tests": "run_tests_torch_cuda_extensions_gpu_test_reports", | |||
"Quantization tests": "run_tests_quantization_torch_gpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be revised if we decide to move quantization tests outside this workflow file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preemptively approving, LGTM! Thanks @SunMarc for splitting the workflows!
- name: Reinstall transformers in edit mode (remove the one installed during docker image build) | ||
working-directory: /transformers | ||
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydshieh I do not understand why we don't:
- Install from main on the docker build with
pip install git+https://github.com/hugginface/transformers@main
(and get all the necessary dependencies) pip uninstall -y transformers
in the dockeractions/checkout@v3
pip install -e .
(As discussed offline with @younesbelkada this should come in a separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no deep reason for what we are doing currently: we just install transformers
during docker image build and reuse it.
What you suggest (by installing main
during docker image build) has a consequence that we can't do experiment easily
in a branch when we change setup.py
).
By using actions/checkout@v3
, the path to transformers
will be different from the current approach, and therefore we have to change all the working directory path and artifact path, otherwise the report will be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this work! I could RIP with the daily CI now.
latest-pytorch-deepspeed-amd: | ||
name: "PyTorch + DeepSpeed (AMD) [dev]" | ||
|
||
# runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210] | ||
# steps: | ||
# - name: Set up Docker Buildx | ||
# uses: docker/setup-buildx-action@v3 | ||
# - name: Check out code | ||
# uses: actions/checkout@v3 | ||
# - name: Login to DockerHub | ||
# uses: docker/login-action@v3 | ||
# with: | ||
# username: ${{ secrets.DOCKERHUB_USERNAME }} | ||
# password: ${{ secrets.DOCKERHUB_PASSWORD }} | ||
# - name: Build and push | ||
# uses: docker/build-push-action@v5 | ||
# with: | ||
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu | ||
# build-args: | | ||
# REF=main | ||
# push: true | ||
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }} | ||
# # Push CI images still need to be re-built daily | ||
# - | ||
# name: Build and push (for Push CI) in a daily basis | ||
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. | ||
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case! | ||
# if: inputs.image_postfix != '-push-ci' | ||
# uses: docker/build-push-action@v5 | ||
# with: | ||
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu | ||
# build-args: | | ||
# REF=main | ||
# push: true | ||
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci | ||
runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210] | ||
steps: | ||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v3 | ||
- name: Check out code | ||
uses: actions/checkout@v3 | ||
- name: Login to DockerHub | ||
uses: docker/login-action@v3 | ||
with: | ||
username: ${{ secrets.DOCKERHUB_USERNAME }} | ||
password: ${{ secrets.DOCKERHUB_PASSWORD }} | ||
- name: Build and push | ||
uses: docker/build-push-action@v5 | ||
with: | ||
context: ./docker/transformers-pytorch-deepspeed-amd-gpu | ||
build-args: | | ||
REF=main | ||
push: true | ||
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }} | ||
# Push CI images still need to be re-built daily | ||
- | ||
name: Build and push (for Push CI) in a daily basis | ||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. | ||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case! | ||
if: inputs.image_postfix != '-push-ci' | ||
uses: docker/build-push-action@v5 | ||
with: | ||
context: ./docker/transformers-pytorch-deepspeed-amd-gpu | ||
build-args: | | ||
REF=main | ||
push: true | ||
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is irrelevant to this PR. We have (or at least had) issue of building AMD CI images and we haven't really make progress on it.
Better to keep this as it is in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh mb ! I'll revert the changes. Thanks for your careful review !
latest-quantization-torch-docker: | ||
name: "Latest Pytorch + Quantization [dev]" | ||
# Push CI doesn't need this image | ||
if: inputs.image_postfix != '-push-ci' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
I would love to see the newly added |
This reverts commit 4cb52b8.
* [CI] Quantization workflow * build dockerfile * fix dockerfile * update self-cheduled.yml * test build dockerfile on push * fix torch install * udapte to python 3.10 * update aqlm version * uncomment build dockerfile * tests if the scheduler works * fix docker * do not trigger on psuh again * add additional runs * test again * all good * style * Update .github/workflows/self-scheduled.yml Co-authored-by: Younes Belkada <[email protected]> * test build dockerfile with torch 2.2.0 * fix extra * clean * revert changes * Revert "revert changes" This reverts commit 4cb52b8. * revert correct change --------- Co-authored-by: Younes Belkada <[email protected]>
What does this PR do ?
This PR adds a workflow for quantization tests + related dockerfile. Since we merged the HfQuantizer PR, the community started integrating their own quantizers into transformers (e.g. AQML and much more in the future). This will lead to many third party libraries in the Dockerfile
huggingface/transformers-all-latest-gpu
. To limit the impact of these libraries on transformers tests, I propose to create a seperate dockerfile + workflow.