Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake build support #1

Open
joker-eph opened this issue Aug 9, 2022 · 7 comments
Open

CMake build support #1

joker-eph opened this issue Aug 9, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@joker-eph
Copy link
Contributor

XLA builds with Bazel at the moment, is it desirable to also have a CMake build?
Who would benefit from this and what workflow will this enable that aren't doable or easy with the current Bazel configuration?

@joker-eph joker-eph added the enhancement New feature or request label Aug 9, 2022
@etherzhhb
Copy link

etherzhhb commented Aug 9, 2022

May be not a strong reason but: In some teams all of they software packages are built with CMake, except TF/XLA. It will make it easier for the team manage the build process of its software packages, if XLA is able to also provide CMake build support.

@stevecapperarm
Copy link

Hello,
Bazel has made things tricky for folk in a couple of ways:

  1. Assumption of network access. Distro build daemons typically do not build with network access and some corporate systems have restricted network access, working around this typically involves fetching assets separately then "preloading" them in a cache just before build,
  2. Multiple versions of Bazel needed. Distros have to carry around a few different versions of Bazel, each one to match up with the particular projects (newer versions of Bazel have not always been backwards compatible with older projects).

It would be incredibly helpful for distros (and others I suspect) to have cmake support, to fall back to.

@bhack
Copy link
Contributor

bhack commented Sep 10, 2022

Is this still actively maintained?

https://github.com/google/bazel-to-cmake

@makslevental
Copy link

Is this still actively maintained?

https://github.com/google/bazel-to-cmake

fwiw such a script exists inside iree https://github.com/openxla/iree/tree/main/build_tools/bazel_to_cmake

wenscarl added a commit to wenscarl/xla that referenced this issue Oct 30, 2023
copybara-service bot pushed a commit that referenced this issue Nov 2, 2023
Imported from GitHub PR #6599

FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168

This PR is closely related to Flax PR-![3416](google/flax#3416).
Copybara import of the project:

--
a4140da by shuw <[email protected]>:

Add FP8 fast accumulation support for cublasLt.

--
9684568 by shuw <[email protected]>:

Improve based on review #1

--
e906d76 by shuw <[email protected]>:

Improve based on review #2

Merging this change closes #6599

FUTURE_COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76
PiperOrigin-RevId: 578904075
copybara-service bot pushed a commit that referenced this issue Nov 2, 2023
Imported from GitHub PR #6599

FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168

This PR is closely related to Flax PR-![3416](google/flax#3416).
Copybara import of the project:

--
a4140da by shuw <[email protected]>:

Add FP8 fast accumulation support for cublasLt.

--
9684568 by shuw <[email protected]>:

Improve based on review #1

--
e906d76 by shuw <[email protected]>:

Improve based on review #2

Merging this change closes #6599

FUTURE_COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76
PiperOrigin-RevId: 578904075
copybara-service bot pushed a commit that referenced this issue Nov 2, 2023
Imported from GitHub PR #6599

FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168

This PR is closely related to Flax PR-![3416](google/flax#3416).
Copybara import of the project:

--
a4140da by shuw <[email protected]>:

Add FP8 fast accumulation support for cublasLt.

--
9684568 by shuw <[email protected]>:

Improve based on review #1

--
e906d76 by shuw <[email protected]>:

Improve based on review #2

Merging this change closes #6599

FUTURE_COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76
PiperOrigin-RevId: 578904075
copybara-service bot pushed a commit that referenced this issue Nov 2, 2023
Imported from GitHub PR #6599

FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168

This PR is closely related to Flax PR-![3416](google/flax#3416).
Copybara import of the project:

--
a4140da by shuw <[email protected]>:

Add FP8 fast accumulation support for cublasLt.

--
9684568 by shuw <[email protected]>:

Improve based on review #1

--
e906d76 by shuw <[email protected]>:

Improve based on review #2

Merging this change closes #6599

COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76
PiperOrigin-RevId: 578948593
wenscarl added a commit to wenscarl/xla that referenced this issue Nov 15, 2023
wenscarl added a commit to wenscarl/xla that referenced this issue Nov 15, 2023
wenscarl added a commit to wenscarl/xla that referenced this issue Dec 14, 2023
copybara-service bot pushed a commit that referenced this issue Dec 15, 2023
Imported from GitHub PR #7751

Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu:

Execution time for matmul using cublasLt and gelu (XLA): 1.28ms
Execution time for matmul_gelu using cublasLt: 1.25ms
Copybara import of the project:

--
e8abce3 by Shu Wang <[email protected]>:

Support cublasLt Fp8 Approx Gelu epilogue fusion.

--
818127c by shuw <[email protected]>:

Remove F32 check

--
5ce3108 by shuw <[email protected]>:

Improve based on review #1

Merging this change closes #7751

COPYBARA_INTEGRATE_REVIEW=#7751 from wenscarl:cublaslt_fp8_gelu 5ce3108
PiperOrigin-RevId: 591236441
@SomeoneSerge
Copy link

I recall being very happy to see this to be the issue #1 🙃

is it desirable to also have a CMake build?

In short, I'd suggest "yes, from the point of view of (package) distributions", because Bazel doesn't support/maybe even hinders dependency injection (in the broad sense, thinking e.g. of cmake's find_package), which is important for achieving consistent package sets/applying patches consistently. For this to be useful in practice, however, one'd also need downstream projects like tensorflow to use xla in a way that permits substitution

@joker-eph
Copy link
Contributor Author

joker-eph commented Feb 14, 2024

Another thing is the efficiency of the build: Bazel is building the code twice (or thrice sometimes) because of host/target split model. CMake does not suffer from this restriction, so a CMake build on my workstation could be easily twice as fast.

It is also much easier to setup things like "use my prebuilt LLVM" which helps again having much fast build since you only build the XLA code and not "the world".

(Docker containers can be used for dependency and controlling the environment, instead of relying on Bazel for making everything hermetic, at a high cost)

copybara-service bot pushed a commit that referenced this issue Feb 23, 2024
…execution scope

Instead of always constructing while operation conditional in the default scope use the scope of a while operation itself.

This generates correct CUDA graph: https://gist.github.com/ezhulenev/a84192fe8b46a4bf1a934a8baa08ea60

Memeset operation launched in a scope #1 is not synchronized with initial condition handle update

PiperOrigin-RevId: 609475974
copybara-service bot pushed a commit that referenced this issue Feb 23, 2024
…execution scope

Instead of always constructing while operation conditional in the default scope use the scope of a while operation itself.

This generates correct CUDA graph: https://gist.github.com/ezhulenev/a84192fe8b46a4bf1a934a8baa08ea60

Memeset operation launched in a scope #1 is not synchronized with initial condition handle update

PiperOrigin-RevId: 609742672
@joelberkeley
Copy link

fwiw i'm finding bazel extremely difficult to use, and i'm not totally inept. A look on the internet suggests it's a common problem. I'm very very much in favour of a simple build tool. I don't know if CMake is that, never used it. I imagine this would be appreciated more widely for an opensource project.

wenscarl added a commit to wenscarl/xla that referenced this issue Apr 1, 2024
wenscarl added a commit to wenscarl/xla that referenced this issue Apr 1, 2024
wenscarl added a commit to wenscarl/xla that referenced this issue Apr 1, 2024
wenscarl added a commit to wenscarl/xla that referenced this issue Apr 1, 2024
copybara-service bot pushed a commit that referenced this issue May 1, 2024
copybara-service bot pushed a commit that referenced this issue May 1, 2024
copybara-service bot pushed a commit that referenced this issue May 13, 2024
… to Initialize()

Imported from GitHub PR #12228

The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9.

Here's a backtrace from the hang that will be fixed:
```
Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"):
#0  0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#5  0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#6  0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#7  0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#8  0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#9  0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
```
Copybara import of the project:

--
f535330 by Trevor Morris <[email protected]>:

Fix hang with cudnn layer norm by moving cudnn init to Initialize()

Merging this change closes #12228

COPYBARA_INTEGRATE_REVIEW=#12228 from trevor-m:tmorris-norm-init f535330
PiperOrigin-RevId: 633220207
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

Reverts 9bb1871

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 657693426
copybara-service bot pushed a commit that referenced this issue Jul 31, 2024
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 658063846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants