Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance FBGEMM nightly CI #1407

Closed

Conversation

shintaro-iwasaki
Copy link
Contributor

@shintaro-iwasaki shintaro-iwasaki commented Oct 21, 2022

This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests. Now a PR with a label such as "test_wheel_nightly", "test_wheel_prerelease", and "test_wheel_release" enables the corresponding wheel-related tests. We don't need to modify YML files again, so we can easily run these tests. Please trigger these tests for suspicious PRs that touch the installation logic. Note that binaries won't be pushed to PYPI if this method is used.

2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (build_wheel.yml) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI). No duplication improves maintainability.

Screen Shot 2022-10-21 at 9 58 56 AM

3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (*.yml) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl

4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec. However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

Screen Shot 2022-10-21 at 5 53 03 AM

@netlify
Copy link

netlify bot commented Oct 21, 2022

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit c7a3262
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/63b8a66a45045400085b1e41

@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shintaro-iwasaki shintaro-iwasaki changed the title [DO NOT MERGE] Test new CI logic Enhance FBGEMM nightly CI with cuDNN fix Oct 21, 2022
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shintaro-iwasaki shintaro-iwasaki added enhancement New feature or request and removed enhancement New feature or request test_wheel_nightly labels Oct 21, 2022
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 21, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively, with a cuDNN installation fix. I want to make them happen together so that the "fix" really fixes an issue.

## 1. Fix the cuDNN installation issue in FBGEMM nightly CI

We used `conda-forge` to install cuDNN, but this unintentionally added CUDA 10 dependency to the FBGEMM nightly package, breaking installation steps that rely on FBGEMM nightly.

To address this issue, we install cuDNN separately in the CI script so that it can coexist with CUDA 11.7, which is used by PyTorch-CUDA nightly.

## 2. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 3. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 4. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 5. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 6. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 150121661d58097164a589d506eeb842ee7ea905
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 22, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively, with a cuDNN installation fix. I want to make them happen together so that the "fix" really fixes an issue.

## 1. Fix the cuDNN installation issue in FBGEMM nightly CI

We used `conda-forge` to install cuDNN, but this unintentionally added CUDA 10 dependency to the FBGEMM nightly package, breaking installation steps that rely on FBGEMM nightly.

To address this issue, we install cuDNN separately in the CI script so that it can coexist with CUDA 11.7, which is used by PyTorch-CUDA nightly.

## 2. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 3. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 4. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 5. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 6. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 425ef0be6c652bc7294f6940adb0a27fc96b353d
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 24, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively, with a cuDNN installation fix. I want to make them happen together so that the "fix" really fixes an issue.

## 1. Fix the cuDNN installation issue in FBGEMM nightly CI

We used `conda-forge` to install cuDNN, but this unintentionally added CUDA 10 dependency to the FBGEMM nightly package, breaking installation steps that rely on FBGEMM nightly.

To address this issue, we install cuDNN separately in the CI script so that it can coexist with CUDA 11.7, which is used by PyTorch-CUDA nightly.

## 2. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 3. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 4. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 5. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 6. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: b0e58ab69a5d313017bafefcbc509e9b47b18e41
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 27, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: f5d56c0c5b91bf35e764496a76131e8defb7e605
shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 28, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 1b49b09a5f724ccfeb2e86fc0848e3f1dca0ef3c
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Oct 31, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 1b49b09a5f724ccfeb2e86fc0848e3f1dca0ef3c
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Nov 1, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 1b49b09a5f724ccfeb2e86fc0848e3f1dca0ef3c
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Nov 2, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 1b49b09a5f724ccfeb2e86fc0848e3f1dca0ef3c
@shintaro-iwasaki shintaro-iwasaki changed the title Enhance FBGEMM nightly CI with cuDNN fix Enhance FBGEMM nightly CI Nov 2, 2022
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Nov 2, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 1b49b09a5f724ccfeb2e86fc0848e3f1dca0ef3c
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Nov 28, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: daa72faa504cb3b91328f9baea5a26ec86f89199
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40597700

shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Nov 28, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Reviewed By: jianyuh

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 10cc446ba855e76111805614e53821377ade4ed3
shintaro-iwasaki added a commit to shintaro-iwasaki/FBGEMM that referenced this pull request Dec 5, 2022
Summary:
This PR updates FBGEMM nightly CI to make it more usable and maintainable by using GitHub Actions more effectively.

## 1. Implement label-triggered wheel tests during a PR review

Previously, we needed to apply a hack to trigger wheel-related tests before merging the PR, which is neither comprehensive nor efficient.

To address this issue, this PR adds optional wheel-related tests.  Now a PR with a label such as `"test_wheel_nightly"`, `"test_wheel_prerelease"`, and `"test_wheel_release"` enables the corresponding wheel-related tests.  We don't need to modify YML files again, so we can easily run these tests.  Please trigger these tests for suspicious PRs that touch the installation logic.  Note that binaries won't be pushed to PYPI if this method is used.

## 2. Remove duplicated logic across YML files

Nightly/release/cpu/gpu wheel scripts shared most in common, but the logic were scattered across different files, which significantly lowered maintainability.

To address this issue, this PR collects all the wheel-related test logic into a single file (`build_wheel.yml`) and makes the callers (e.g., scheduled nightly trigger or per-PR tests using labels) pass flags to control the flow (e.g., per-PR tests do not push the wheel binary to PYPI).  No duplication improves maintainability.

<img width="1269" alt="Screen Shot 2022-10-21 at 9 58 56 AM" src="https://user-images.githubusercontent.com/15073003/197249555-bb00f3b1-9bc9-46f1-8fea-7874460723f6.png">

## 3. Support handy wheel-related tests on a local machine

Previously, the core wheel-related build/test logics were embedded into GitHub workflow files (`*.yml`) mixed with AWS-specific commands, so it was very tedious to test the nightly logic on a local machine, lowering the developer efficiency.

To address this issue, this PR extracts core scripts from GitHub workflow files and makes them standalone so that the developer can try wheel-build locally (i.e., without access to AWS machine). It uses conda to create a virtual software environment, so it should be handy enough in many cases though this is not as robust as a container-based solution.

```sh
# For example, check prerelease PyTorch (pytorch-test package) locally.
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM
git submodule update --init --recursive
bash .github/scripts/build_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -o fbgemm_gpu_test
bash .github/scripts/test_wheel.bash -p 3.10 -c 11.7 -v -P pytorch-test -w fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
git clone https://github.com/pytorch/torchrec.git
cd torchrec
git submodule update --init --recursive
bash ../.github/scripts/test_torchrec.bash -o torchrec_nightly -p 3.10 -c 11.7 -v -P pytorch-test -w ../fbgemm_gpu/dist/fbgemm_gpu_test-2022.10.20-cp310-cp310-manylinux1_x86_64.whl
```

## 4. [Temporary] Add TorchRec integration test

TorchRec is one of our most important users, but we didn't test TorchRec integration in the nightly CI, so the changes in FBGEMM sometimes surprised the TorchRec developers.

To address this issue, though it is temporary, but this PR adds a TorchRec integration test before pushing a binary to PYPI so that we can make sure that FBGEMM works with TorchRec.  However, now broken TorchRec can break FBGEMM-nightly; we will investigate a more sustainable solution that makes everyone happy.

## 5. Better interface to manually trigger the CI

It looks nice. I love to operate the release management in an automated way as much as possible to reduce human errors at the very last minute. In the future, we can add more options if needed.

<img width="844" alt="Screen Shot 2022-10-21 at 5 53 03 AM" src="https://user-images.githubusercontent.com/15073003/197240190-e808df23-795a-451d-a7d9-ec1c43c68e92.png">

Pull Request resolved: pytorch#1407

Reviewed By: jianyuh

Differential Revision: D40597700

Pulled By: shintaro-iwasaki

fbshipit-source-id: 10cc446ba855e76111805614e53821377ade4ed3
@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki merged this pull request in a8eeac7.

facebook-github-bot pushed a commit that referenced this pull request Jan 11, 2023
Summary:
## What does this PR do?

We are updating the release mechanism of the FBGEMM nightly build. The new mechanism (#1407) is cleaner and its tests are more comprehensive.

However, changing such a mechanism needs extra caution because it can break workloads that use FBGEMM nightly binaries. Actually, #1407 did not disable the existing mechanism.  After #1407 was merged, we need to check the following:
- [x] The new nightly job is triggered every day.
- [x] The new nightly job passes tests on GPUs stably.
- [ ] The new nightly job uploads wheel files to PyPI successfully.

The new nightly job seems working correctly (https://github.com/pytorch/FBGEMM/actions/runs/3882482573), so this PR checks the last point by disabling the trigger to the original mechanism and enabling the new one.

## What if the new nightly mechanism is not functional?

Nevertheless, the new mechanism might not work for some reason.  If the new nightly mechanism does not work for some reason, we must revert this PR, but even without doing so, we can trigger the existing `Push Binary Nightly` manually so that we can fall back to the current method.
...
I will carefully check it for a few days :)

Pull Request resolved: #1541

Reviewed By: mjanderson09

Differential Revision: D42438184

Pulled By: shintaro-iwasaki

fbshipit-source-id: 5a887c5c51cf1a9db99e69892ecf54b003ae86a2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants