-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate jobs off current GCP GHA runner cluster #18238
Comments
Experiments are showing that local ccache using github actions is going to be nowhere near functional for some of the current CI builds. Maybe I have something misconfigured, but I'm seeing cache sizes of up to 2GB still not be enough for Debug or ASan jobs. I can try running with no cache limit to see what that produces, but GitHub's soft limit of 10GB across all cache entries before it starts evicting entries will trigger very frequently if we have too many jobs using unique cache keys. |
…18252) Progress on #15332 and #18238 . The [`build_tools/docker/docker_run.sh`](https://github.com/iree-org/iree/blob/main/build_tools/docker/docker_run.sh) script does a bunch of weird/hacky setup, including setup for `gcloud` (for working with GCP) and Bazel-specific Docker workarounds. Most CMake builds can just use a container for the entire workflow (https://docs.github.com/en/actions/writing-workflows/choosing-where-your-workflow-runs/running-jobs-in-a-container). Note that GitHub in its infinite wisdom changed the default shell _just_ for jobs that run in a container, from `bash` to `sh`, so we flip it back. These jobs run nightly on GitHub-hosted runners, so I tested here: * https://github.com/iree-org/iree/actions/runs/10396020082/job/28789218696 * https://github.com/iree-org/iree/actions/runs/10422541951/job/28867245589 (Those jobs should also run on this PR, but they'll take a while) skip-ci: no impact on other workflows
Progress on #15332 and #18238 . Similar to #18252, this drops a dependency on the [`build_tools/docker/docker_run.sh`](https://github.com/iree-org/iree/blob/main/build_tools/docker/docker_run.sh) script. Unlike that PR, this goes a step further and also stops using [`build_tools/cmake/build_all.sh`](https://github.com/iree-org/iree/blob/main/build_tools/cmake/build_all.sh). Functional changes: * No more building `iree-test-deps` * We only get marginal value out of compiling test files using a debug compiler * Those tests are on the path to being moved to https://github.com/iree-org/iree-test-suites * No more ccache * The debug build cache is too large for a local / GitHub Actions cache * I want to limit our reliance on the remote cache at `http://storage.googleapis.com/iree-sccache/ccache` (which uses GCP for storage and needs GCP auth) * Experiments show that this build is not significantly faster when using a cache, or at least dropping `iree-test-deps` provides equivalent time savings Logs before: https://github.com/iree-org/iree/actions/runs/10417779910/job/28864909582 (96% cache hits, 9 minute build but 19 minutes total, due to `iree-test-deps`) Logs after: https://github.com/iree-org/iree/actions/runs/10423409599/job/28870060781?pr=18255 (no cache, 11 minute build) ci-exactly: linux_x64_clang_debug --------- Co-authored-by: Marius Brehler <[email protected]>
Experiments so far: I have gone through https://github.com/actions/actions-runner-controller and gave it a try through a basic POC but many things still aren't working yet. To replicate what I've done so far:
These all work fairly out of the box. Few suggestions:
Currently blocked - getting images working. Going to keep trying to work on this but may pull someone in to help at this point since the k8s part is at least figured out. |
I created https://github.com/iree-org/base-docker-images and am working to migrate what's left in https://github.com/iree-org/iree/tree/main/build_tools/docker to that repo. Starting with a few workflows that don't have special GCP requirements right now like https://github.com/iree-org/iree/blob/main/.github/workflows/ci_linux_x64_clang_debug.yml. Local testing of iree-org/base-docker-images#4 looks promising to replace We could also try using the manylinux image but I'm not sure if we should expect that to work well enough with the base C++ toolchains outside of python packaging. I gave that a try locally too but got errors like:
|
If we're not sure how we want to set up a remote cache by the time we want to transition, I could at least prep a PR that switches relevant workflows to stop using a remote cache. |
Progress on #15332. This uses a new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from https://github.com/iree-org/base-docker-images. This stops using the remote cache that is hosted on GCP. Build time _without a cache_ is about 20 minutes on current runners, while build _with a cache_ is closer to 10 minutes. Build time without a cache is closer to 28-30 minutes on new runners. We can try adding back a cache using GitHub or our own hosted storage. I tried to continue using the previous cache during this transition period, but the `gcloud` command needs to run on the host, and I'd like to stop using the `docker_run.sh` script. I'm hoping we can keep folding away this sort of complexity by having the build machines run a dockerfile that includes key environment components like utility tools and any needed authorization/secrets (see #18238). ci-exactly: linux_x64_clang
Shared branch tracking the migration: https://github.com/iree-org/iree/tree/shared/runner-cluster-migration That currently switches the |
Progress on #15332. I'm trying to get rid of the `docker_run.sh` scripts, replacing them with GitHub's `container:` feature. While local development flows _may_ want to use Docker like the CI workflows do, those scripts contained a lot of special handling and file mounting to be compatible with Bazel. Much of that is not needed for CMake and can be folded away, though the `--privileged` option needed here is one exception. This stops using the remote cache that is hosted on GCP. We can try adding back a cache using GitHub or our own hosted storage as part of #18238. Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- ASan | Cache | GCP runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064) ASan | No cache | GCP runners | 28 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181) ASan | Cache | Azure runners | (not configured yet) ASan | No cache | Azure runners | 35 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396) | | | TSan | Cache | GCP runners | 12 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939) TSan | No cache | GCP runners | 21 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002) TSan | Cache | Azure runners | (not configured yet) TSan | No cache | Azure runners | 32 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396) ci-exactly: linux_x64_clang_asan
We're still figuring out how to get build times back to reasonable on the new cluster by configuring some sort of cache. The ccache (https://ccache.dev/) does not have first class support for Azure Blob Storage, so we are trying a few things:
sccache (https://github.com/mozilla/sccache) is promising since it does have first class support for Azure Blob Storage: https://github.com/mozilla/sccache/blob/main/docs/Azure.md Either way we still need to figure out the security/access model. Ideally we'd have public read access the cache, but we might need to limit even that if the APIs aren't available. Might have to make some (temporary?) tradeoffs where only PRs sent from the main repo would get access to the cache via GitHub Secrets (which aren't shared with PRs from forks) 🙁 |
As a data point I've used sccache locally and it worked as expected for our cmake builds. |
Yep I just had good results with sccache locally on Linux and using Azure. I think good next steps are:
|
Progress on iree-org/iree#18238 https://github.com/mozilla/sccache We may use this instead of ccache for our shared remote cache usage, consider sccache has first class Azure Blob Storage support: https://github.com/mozilla/sccache/blob/main/docs/Azure.md.
Cache scopes / namespaces / keyssccache supports a
We can use that to have a single storage account for multiple projects and that will also allow us to better manage the storage in the cloud project itself, e.g. checking the size of each folder or deleting an entire folder. Note that sccache's architecture (https://github.com/mozilla/sccache/blob/main/docs/Architecture.md) includes a sophisticated hash function which includes environment variables, the compiler binary, compiler arguments, files, etc. , so sharing a cache folder between e.g. MSVC on Windows and clang on Linux should be fine. I'd still prefer we separate those caches though. Some naming ideas:
Any of the scopes that have frequently changing names should have TTLs on their files or we should audit and clean them up manually from time to time, so they don't live indefinitely. |
This commit is part of this larger issue that is tracking our migration off the GCP runners, storage buckets, etc: #18238. In this initial port, we move over one high traffic job (`linux_x86_64_release_packages`) and a few nightlies (`linux_x64_clang_tsan`, `linux_x64_clang_debug`) to monitor and make sure the cluster is working as intended. Time Comparisons: Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- linux_x86_64_release_packages | GitHub Cache | AKS Cluster | 9 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464301/job/29948809708) linux_x64_clang_tsan | GCP Cache | AKS cluster | 10 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464292/job/29948816896) linux_x64_clang_debug | GCP Cache | AKS cluster | 11 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464308/job/29948805561) linux_x64_clang_tsan | No Cache | AKS cluster | 17 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10798471545/job/29952051686) linux_x64_clang_debug | No Cache | AKS cluster | 13 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10798475582/job/29952064138) | | | linux_x86_64_release_packages | GitHub Cache | GCP Runners | 11 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10796348911/job/29945148145) linux_x64_clang_tsan | GCP Cache | GCP Runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10789692182/job/29923234380) linux_x64_clang_debug | GCP Cache | GCP Runners | 15 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10680250213/job/29601266656) The GCP cache timings for the AKS cluster are not a great representation of what we will be seeing going forward because the AKS cluster does not have the setup/authentication to write to the GCP cache. We have changes coming in https://github.com/iree-org/iree/tree/shared/runner-cluster-migration that will spin up an Azure cache using sccache to help with the No Cache timings. Right now the cluster is using 96 core machines, which we can probably tone down when the caching work lands. --------- Signed-off-by: saienduri <[email protected]>
Progress on #15332. This uses a new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from https://github.com/iree-org/base-docker-images. This stops using the remote cache that is hosted on GCP. Build time _without a cache_ is about 20 minutes on current runners, while build _with a cache_ is closer to 10 minutes. Build time without a cache is closer to 28-30 minutes on new runners. We can try adding back a cache using GitHub or our own hosted storage. I tried to continue using the previous cache during this transition period, but the `gcloud` command needs to run on the host, and I'd like to stop using the `docker_run.sh` script. I'm hoping we can keep folding away this sort of complexity by having the build machines run a dockerfile that includes key environment components like utility tools and any needed authorization/secrets (see #18238). ci-exactly: linux_x64_clang
Progress on #15332. I'm trying to get rid of the `docker_run.sh` scripts, replacing them with GitHub's `container:` feature. While local development flows _may_ want to use Docker like the CI workflows do, those scripts contained a lot of special handling and file mounting to be compatible with Bazel. Much of that is not needed for CMake and can be folded away, though the `--privileged` option needed here is one exception. This stops using the remote cache that is hosted on GCP. We can try adding back a cache using GitHub or our own hosted storage as part of #18238. Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- ASan | Cache | GCP runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064) ASan | No cache | GCP runners | 28 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181) ASan | Cache | Azure runners | (not configured yet) ASan | No cache | Azure runners | 35 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396) | | | TSan | Cache | GCP runners | 12 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939) TSan | No cache | GCP runners | 21 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002) TSan | Cache | Azure runners | (not configured yet) TSan | No cache | Azure runners | 32 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396) ci-exactly: linux_x64_clang_asan
Progress on #15332. This uses a new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from https://github.com/iree-org/base-docker-images. This stops using the remote cache that is hosted on GCP. Build time _without a cache_ is about 20 minutes on current runners, while build _with a cache_ is closer to 10 minutes. Build time without a cache is closer to 28-30 minutes on new runners. We can try adding back a cache using GitHub or our own hosted storage. I tried to continue using the previous cache during this transition period, but the `gcloud` command needs to run on the host, and I'd like to stop using the `docker_run.sh` script. I'm hoping we can keep folding away this sort of complexity by having the build machines run a dockerfile that includes key environment components like utility tools and any needed authorization/secrets (see #18238). ci-exactly: linux_x64_clang Signed-off-by: saienduri <[email protected]>
This commit is part of this larger issue that is tracking our migration off the GCP runners, storage buckets, etc: iree-org#18238. In this initial port, we move over one high traffic job (`linux_x86_64_release_packages`) and a few nightlies (`linux_x64_clang_tsan`, `linux_x64_clang_debug`) to monitor and make sure the cluster is working as intended. Time Comparisons: Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- linux_x86_64_release_packages | GitHub Cache | AKS Cluster | 9 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464301/job/29948809708) linux_x64_clang_tsan | GCP Cache | AKS cluster | 10 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464292/job/29948816896) linux_x64_clang_debug | GCP Cache | AKS cluster | 11 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10797464308/job/29948805561) linux_x64_clang_tsan | No Cache | AKS cluster | 17 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10798471545/job/29952051686) linux_x64_clang_debug | No Cache | AKS cluster | 13 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10798475582/job/29952064138) | | | linux_x86_64_release_packages | GitHub Cache | GCP Runners | 11 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10796348911/job/29945148145) linux_x64_clang_tsan | GCP Cache | GCP Runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10789692182/job/29923234380) linux_x64_clang_debug | GCP Cache | GCP Runners | 15 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10680250213/job/29601266656) The GCP cache timings for the AKS cluster are not a great representation of what we will be seeing going forward because the AKS cluster does not have the setup/authentication to write to the GCP cache. We have changes coming in https://github.com/iree-org/iree/tree/shared/runner-cluster-migration that will spin up an Azure cache using sccache to help with the No Cache timings. Right now the cluster is using 96 core machines, which we can probably tone down when the caching work lands. --------- Signed-off-by: saienduri <[email protected]>
See #18238. We've finished migrating most load bearing workflows to use a new cluster of self-hosted runners. These workflows are still using GCP runners and are disabled: * `build_test_all_bazel`: this may work on the new cluster using the existing `gcr.io/iree-oss/base-bleeding-edge` dockerfile, but it uses some remote cache storage on GCP and I want to migrate that to https://github.com/iree-org/base-docker-images/. Need to take some time to install deps, evaluate build times with/without a remote cache, etc. * `test_nvidia_t4`, `nvidiagpu_cuda`, `nvidiagpu_vulkan`: we'll try to spin up some VMs in the new cluster / cloud project with similar GPUs. That's a high priority for us, so maybe within a few weeks. Additionally, these workflows are still enabled but we should find a longer term solution for them: * `linux_arm64_clang` this is still enabled in code... for now. We can disable https://github.com/iree-org/iree/actions/workflows/ci_linux_arm64_clang.yml from the UI * arm64 packages are also still enabled: https://github.com/iree-org/iree/blob/cc891ba8e7da3a3ef1c8650a66af0aa53ceed06b/.github/workflows/build_package.yml#L46-L50
Current status:
|
…e-org#18511) This commit is part of this larger issue that is tracking our migration off the GCP runners, storage buckets, etc: iree-org#18238. This builds on iree-org#18381, which migrated * `linux_x86_64_release_packages` * `linux_x64_clang_debug` * `linux_x64_clang_tsan` Here, we move over the rest of the critical linux builder workflows off of the GCP runners: * `linux_x64_clang` * `linux_x64_clang_asan` This also drops all CI usage of the GCP cache (`http://storage.googleapis.com/iree-sccache/ccache`). Some workflows now use sccache backed by Azure Blob Storage as a replacement. There are few issues with this (mozilla/sccache#2258) that prevent us providing read only access to the cache in PRs created from forks, so **PRs from forks currently don't use the cache and will have slower builds**. We're covering for this slowdown by using larger runners, but if we can roll out caching to all builds then we might use runners with fewer cores. Along with the changes to the cache, usage of Docker is rebased on images in the https://github.com/iree-org/base-docker-images/ repo and the `build_tools/docker/docker_run.sh` script is now only used by unmigrated workflows (`linux_arm64_clang` and `build_test_all_bazel`). --------- Signed-off-by: saienduri <[email protected]> Signed-off-by: Elias Joseph <[email protected]> Co-authored-by: Scott Todd <[email protected]> Co-authored-by: Elias Joseph <[email protected]>
…#18526) See iree-org#18238. We've finished migrating most load bearing workflows to use a new cluster of self-hosted runners. These workflows are still using GCP runners and are disabled: * `build_test_all_bazel`: this may work on the new cluster using the existing `gcr.io/iree-oss/base-bleeding-edge` dockerfile, but it uses some remote cache storage on GCP and I want to migrate that to https://github.com/iree-org/base-docker-images/. Need to take some time to install deps, evaluate build times with/without a remote cache, etc. * `test_nvidia_t4`, `nvidiagpu_cuda`, `nvidiagpu_vulkan`: we'll try to spin up some VMs in the new cluster / cloud project with similar GPUs. That's a high priority for us, so maybe within a few weeks. Additionally, these workflows are still enabled but we should find a longer term solution for them: * `linux_arm64_clang` this is still enabled in code... for now. We can disable https://github.com/iree-org/iree/actions/workflows/ci_linux_arm64_clang.yml from the UI * arm64 packages are also still enabled: https://github.com/iree-org/iree/blob/cc891ba8e7da3a3ef1c8650a66af0aa53ceed06b/.github/workflows/build_package.yml#L46-L50
The Bazel build would also benefit from a remote cache we can directly manage and configure for public read + privileged write access. Instructions for Bazel: https://bazel.build/remote/caching#nginx |
Progress on #15332 and #18238. Fixes #16915. This switches the `build_test_all_bazel` CI job from the `gcr.io/iree-oss/base-bleeding-edge` Dockerfile using GCP for remote cache storage to the `ghcr.io/iree-org/cpubuilder_ubuntu_jammy_x86_64` Dockerfile with no remote cache. With no cache, this job takes between 18 and 25 minutes. Early testing also showed times as long as 60 minutes, if the Docker command and runner are both not optimally configured for Bazel (e.g. not using a RAM disk). The job is also moved from running on every commit to running on a nightly schedule while we evaluate how frequently it breaks and how long it takes to run. If we set up a new remote cache (https://bazel.build/remote/caching), we can move it back to running more regularly.
Progress on #15332. This was the last active use of [`build_tools/docker/`](https://github.com/iree-org/iree/tree/main/build_tools/docker), so we can now delete that directory: #18566. This uses the same "cpubuilder" dockerfile as the x86_64 builds, which is now built for multiple architectures thanks to iree-org/base-docker-images#11. As before, we install a qemu binary in the dockerfile, this time using the approach in iree-org/base-docker-images#13 instead of a forked dockerfile. Prior PRs for context: * #14372 * #16331 Build time varies pretty wildly depending on cache hit rate and the phase of the moon: | Scenario | Cache hit rate | Time | Logs | | -- | -- | -- | -- | Cold cache | 0% | 1h45m | [Logs](https://github.com/iree-org/iree/actions/runs/10962049593/job/30440393279) Warm (?) cache | 61% | 48m | [Logs](https://github.com/iree-org/iree/actions/runs/10963546631/job/30445257323) Warm (hot?) cache | 98% | 16m | [Logs](https://github.com/iree-org/iree/actions/runs/10964289304/job/30447618503?pr=18569) CI history (https://github.com/iree-org/iree/actions/workflows/ci_linux_arm64_clang.yml?query=branch%3Amain) shows that regular 97% cache hit rates and 17 minute job times are possible. I'm not sure why one test run only got 61% cache hits. This job only runs nightly, so that's not a super high priority to investigate and fix. If we migrate the arm64 runner off of GCP (#18238) we can further simplify this workflow by dropping its reliance on `gcloud auth application-default print-access-token` and the `docker_run.sh` script. Other workflows are now using `source setup_sccache.sh` and some other code.
Fixes #15332. The dockerfiles in this repository have all been migrated to https://github.com/iree-org/base-docker-images/ and all uses in-tree have been updated. I'm keeping the https://github.com/iree-org/iree/blob/main/build_tools/docker/docker_run.sh script for now, but I've replaced nearly all uses of that with GitHub's `container:` argument (https://docs.github.com/en/actions/writing-workflows/choosing-where-your-workflow-runs/running-jobs-in-a-container). All remaining uses need to run some code outside of Docker first, like `gcloud auth application-default print-access-token`. As we continue to migrate jobs off of GCP runners (#18238), we'll be using a different authentication and caching setup that removes that requirement.
Hey @ScottTodd , IREE has now been added to the list of supported repos for https://gitlab.arm.com/tooling/gha-runner-docs 🥳 Would be able to give that a try? C7g instances include SVE (these are Graviton 3 machines) and that's what I suggest using. Here's an overview of the hardware: I'd probably start with -Andrzej |
Thanks! Do you know if the |
Just the repo. Let me know if that's an issue - these are "early days" and IREE is effectively one of the genuine pigs :) |
Context: #18238 (comment) This uses https://gitlab.arm.com/tooling/gha-runner-docs to run on Arm Hosted GitHub Action (GHA) Runners, instead of the runners that Google has been hosting. Note that GitHub also offers Arm runners, but they are expensive and require a paid GitHub plan to use (https://docs.github.com/en/actions/using-github-hosted-runners/using-larger-runners/about-larger-runners). For now this is continuing to run nightly, but we could also explore running more regularly if Arm wants and approves. We'd want to figure out how to use a build cache efficiently for that though. We can use sccache storage on Azure, but there might be charges between Azure and AWS for the several gigabytes of data moving back and forth. If we set up a dedicated cache server (#18557), we'll at least have more visibility into and control over the storage and compute side of billing. Test runs: * https://github.com/iree-org/iree/actions/runs/11114007934 (42 minutes) * https://github.com/iree-org/iree/actions/runs/11114658487 (40 minutes) * https://github.com/iree-org/iree/actions/runs/11114757082 (38 minutes) * https://github.com/iree-org/iree/actions/runs/11128634554 (40 minutes) skip-ci: no impact on other builds
ARM runners are migrated (assuming tonight's nightly package build works). We're still working on bringing back NVIDIA/CUDA runners and larger Windows runners. |
Should I pull down the other ARM runners? |
Yes, that should be fine. |
…9242) This code was used to configure self-hosted runners on GCP. We have migrated self-hosted runners to Azure and on-prem runners, so this fixes iree-org#18238. Sub-issues to add back Windows and NVIDIA GPU coverage will remain open.
…9242) This code was used to configure self-hosted runners on GCP. We have migrated self-hosted runners to Azure and on-prem runners, so this fixes iree-org#18238. Sub-issues to add back Windows and NVIDIA GPU coverage will remain open. Signed-off-by: Giacomo Serafini <[email protected]>
Following the work at #17957 and #16203, it is just about time to migrate away from the GitHub Actions runners hosted on Google Cloud Platform.
Workflow refactoring tasks
Refactor workflows such that they don't depend on GCP:
gcloud
commandhttp://storage.googleapis.com/iree-sccache/ccache
(configured usingsetup_ccache.sh
)build_tools/github_actions/docker_run.sh
scriptRunner setup tasks
Transition tasks
Switch all jobs that need a self hosted runner to the new runners
linux_x86_64_release_packages
inpkgci_build_packages.yml
linux_x64_clang
inci_linux_x64_clang.yml
linux_x64_clang_asan
inci_linux_x64_clang_asan.yml
linux_x64_clang_tsan
inci_linux_x64_clang_tsan.yml
linux_x64_clang_debug
inci_linux_x64_clang_debug.yml
build_test_all_bazel
inci.yml
linux_arm64_clang
inci_linux_arm64_clang.yml
build_packages
(arm64) inbuild_package.yml
test
inpkgci_test_nvidia_t4.yml
nvidiagpu_cuda
inpkgci_regression_test.yml
nvidiagpu_vulkan
inpkgci_regression_test.yml
Other
The text was updated successfully, but these errors were encountered: