Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pytorch binary for macos-arm64 workflow #1215

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

sjain-stanford
Copy link
Member

@sjain-stanford sjain-stanford commented Aug 12, 2022

We're seeing flakiness of the macos-arm64 build with "pytorch source" enabled (passed on original PR but failed twice on main since, introducing increased overhead in landing PRs).

I'm leaning towards flipping this back to binary. This should help with the following:

  • reduce the number of long running pytorch source builds to one job (ubuntu-OOT)
  • reduce cache invalidations for macos builds to only llvm updates (which are weekly rather than hourly)
  • speed-up fail-fast signal despite being decoupled from ubuntu matrix jobs

@sjain-stanford sjain-stanford force-pushed the sambhav/pytorch_binary_macos branch from 3cd4e1f to d0220bf Compare August 12, 2022 12:29
@powderluv powderluv merged commit b8bd0a4 into llvm:main Aug 12, 2022
sjain-stanford added a commit that referenced this pull request Aug 12, 2022
My earlier[ PR](#1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 

1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: #1215

> "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."

2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
* Add git commit id to llvm.ident metadata
* Addressing review comment: llvm#1211 (comment)

Signed-off-by: Whitney Tsang <[email protected]>

Co-authored-by: Whitney Tsang <[email protected]>
Co-authored-by: Ettore Tiotto <[email protected]>
@sjain-stanford sjain-stanford deleted the sambhav/pytorch_binary_macos branch November 10, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants