-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify matrix configuration for CI workflows #1213
Simplify matrix configuration for CI workflows #1213
Conversation
a8b8ac5
to
b11f37c
Compare
This is looking great. The arm64 Pytorch src build should work if you remove this one line torch-mlir/build_tools/build_libtorch.sh Line 134 in 51bfe25
It is trying to uninstall the systemwide torch. |
…ch source build for arm64
Thanks, patched. Let's wait to see if the arm64 pytorch source workflow goes through with the fix. If there are more errors, I can revert to pytorch binary and land it for now (to avoid cache evictions the longer this is open). I'll wait for an "all green" CI before landing, but if this looks good otherwise, please feel free to ✅ this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to switch arm64 pytorch source in a follow on.
nicely done. The silly cache gets generated again when it merges so we got to wait for it again |
Ah I was wondering why it didn't restore from cache after landing because the keys didn't change. Good to know this is normal. Maybe it treats GHA runs on PRs differently than runs on push to main. Oh well... |
... and thank you for the help in reviewing it! |
My earlier[ PR](#1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: #1215 > "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error." 2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.
…harable for other drivers (llvm#1213) Signed-off-by: Tung D. Le <[email protected]>
Addresses #1207.
Provisioned jobs:
Main changes
os
,targetarch
,python-version
,llvmtype
.Further improvements (to be addressed in follow-on):
Passing workflow:
https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309