Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump LLVM to get bazel fixes #2517

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

sjain-stanford
Copy link
Member

@sjain-stanford sjain-stanford commented Oct 18, 2023

The last llvm bump in #2511 pointed to llvm/llvm-project@b44b349, however the bazel build upstream was not clean at this point:

ERROR: /root/.cache/bazel/_bazel_root/b89349c08f7224396763d14fe35cba11/external/llvm-project/mlir/BUILD.bazel:5837:18: TdGenerate
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOpsInterface.h.inc failed: (Exit 1): mlir-tblgen failed: error executing command ...
                                                                                                                                                    
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Could not find include file 'mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td'                                                                                                           
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"                                                                                                                                                                                                              
        ^                                                                                                                                                                                                                                                              
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Unexpected token at top level                                                                                                                                                           
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"                                                                                                                                                                                                              
        ^       

The bazel fixes followed in a subsequent commit at llvm/llvm-project@28b27c1. This PR bumps LLVM by a few more commits (to include the bazel fixes) which helps restore Torch-MLIR's bazel build back to 🟢 .

GHA workflow to test bazel build: https://github.com/sjain-stanford/torch-mlir/actions/runs/6555101471/job/17803082508

@stellaraccident stellaraccident marked this pull request as ready for review October 18, 2023 01:35
Copy link
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This week was a mess of conflicting upstream patches. Sorry...

@sjain-stanford
Copy link
Member Author

This week was a mess of conflicting upstream patches. Sorry...

No worries at all! It is unavoidable though with bazel builds not being merge gating upstream 😅 . Thanks for the quick ✅ .

@sjain-stanford sjain-stanford merged commit 52abae1 into llvm:main Oct 18, 2023
5 checks passed
@sjain-stanford sjain-stanford deleted the sambhav/llvm_bump branch October 18, 2023 05:00
sjain-stanford added a commit to cruise-automation/mlir-tcp that referenced this pull request Oct 20, 2023
…ync (#11)

## Why
When bumping LLVM up, it is crucial to be able to test all downstream
repos depending on it to ensure they work **in tandem** (and not just in
isolation).

In the past, LLVM upgrades were simpler because torch-mlir took a hard
dependency on mhlo/stablehlo and, in doing so, ensured that the llvm
"green commit" (sha1) that torch-mlir and stablehlo were built+tested
against was pre-identified. During this time mlir-tcp was developed on a
branch of torch-mlir.

This meant when upgrades were needed downstream, we’d simply point to
torch-mlir@HEAD (sha4) and pick the llvm-project (sha1) and
mhlo/stablehlo (sha3) hashes it’d refer to, since these are already
tested to work together. This became our set of green commits
(llvm@sha1, stablehlo@sha3, torch-mlir@sha4) for downstream integrations
(e.g cruise monorepo).

<img width="500" alt="image"
src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/42078522-466c-449f-8d7e-496facc1447c">

At present the situation is complicated because torch-mlir no longer
takes a hard dependency on stablehlo (stablehlo e2e tests
[disabled](llvm/torch-mlir#2460)).

Here's details from a recent upgrade scenario that motivated this RFC.

We picked torch-mlir@HEAD which was right after the llvm bump in
llvm/torch-mlir#2511 pointing to
llvm/llvm-project@b44b349,
but soon realized (when we started building torch-mlir) that the llvm
bazel build upstream was broken:

```
ERROR: /root/.cache/bazel/_bazel_root/b89349c08f7224396763d14fe35cba11/external/llvm-project/mlir/BUILD.bazel:5837:18: TdGenerate
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOpsInterface.h.inc failed: (Exit 1): mlir-tblgen failed: error executing command ...
                                                                                                                                                    
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Could not find include file 'mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td'                                                                                                           
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"                                                                                                                                                                                                              
        ^                                                                                                                                                                                                                                                              
```

The bazel fixes followed in a subsequent commit at
llvm/llvm-project@28b27c1.
Hence llvm had to be re-bumped in torch-mlir
(llvm/torch-mlir#2517). However, after a bit
more work we hit these failing stablehlo tests, which surfaced the fact
that stablehlo pointed to by torch-mlir could no longer be used, and we
had to separately identify the sha3 of stablehlo that would build
cleanly against sha1 of llvm.

```
@stablehlo//stablehlo/conversions/tosa/tests:binary.mlir.test            FAILED in 0.7s                                                       
@stablehlo//stablehlo/tests:print_stablehlo.mlir.test                    FAILED in 4.7s
```


This meant the burden of identifying the llvm green commit (that works
across the board) is shifted further downstream from torch-mlir.
Incidentally we are in a great position to leverage mlir-tcp to identify
the set of green commits, given it already directly depends on each of
these repos.

<img width="500" alt="image"
src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/cadd38c4-71ec-45b0-8888-85ac0bfd4e99">


## What
This PR is an attempt to leverage the mlir-tcp repo as our "proxy" for
such downstream integrations, and _I think_ contains everything needed
to be able to do that.

## How
Specifically, we should now be able to run these from the comfort of
`mlir-tcp`:

```shell
bazel test --config=clang_linux @llvm-project//mlir/...
bazel test --config=clang_linux @stablehlo//...
bazel test --config=clang_linux @torch-mlir//...
```

We provide `local_repos.bzl` that allows easier local testing of patches
that later need to be upstreamed, and while they're being upstreamed we
could land them as patches to our `http_archive` targets.

Note: I include a `stablehlo.patch` that allows testing stablehlo from
`mlir-tcp`. This is temporary and can be removed once
openxla/stablehlo#1810 lands.

This PR also enables each of the 3p test suites as GHA workflows
(non-merge gating for now, we can change this). These workflows are
automatically skipped unless a change is made to `deps.bzl` (which
usually means bumping 3p deps), as it would be unnecessary to run them
for every PR and `main` commit post-merge.

Here's a snapshot from this PR's workflows, having bumped stablehlo
commit.

<img width="747" alt="image"
src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/e535ed39-33f7-4941-958c-3a5d0c0adef6">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants