Fix device list of loaded executable in PJRT plugin for multiple GPUs #19369

PragmaTwice · 2024-12-04T14:48:51Z

It closes #19366, and blocks #19279.

After this PR, ClientOptions::Compile will first check the device assignment in the compile options, and then return the corresponding device list with the loaded executable.

To achieve this, we introduce protobuf via FetchContent in this PR, which is scoped to the PJRT plugin. Compile options will be passed by the PJRT client encoded in protobuf, and in this plugin we decode it first and then retrieve some interesting fields.

ci-exactly: build_packages, test_pjrt

integrations/pjrt/third_party/pjrt_c_api/CMakeLists.txt

PragmaTwice · 2024-12-05T13:29:32Z

It works now and is ready for review : )

ScottTodd

Nice, the FetchContent for protobuf isn't as bad as I was expecting.

integrations/pjrt/cmake/protobuf_cc_library.cmake

integrations/pjrt/CMakeLists.txt

integrations/pjrt/third_party/pjrt_c_api/xla/pjrt/compile_options.proto

ScottTodd · 2024-12-05T16:50:06Z

integrations/pjrt/third_party/pjrt_c_api/xla/xla_data.proto

+// LINT.ThenChange(
+//   https://www.tensorflow.org/code/tensorflow/compiler/xla/tools/driver.cc
+// )


(no action needed for this third_party code)

classic, 404 error :P

(looks like the new link is https://github.com/openxla/xla/blob/main/xla/tools/driver.cc)

Yeah there's many broken links in XLA since it's separated out from TensorFlow. The path compiler/xla is now moved to third_party/xla and is syncing with the openxla/xla repo.

ScottTodd · 2024-12-05T17:09:58Z

integrations/pjrt/cmake/protobuf_cc_library.cmake

+  )
+  target_link_libraries(${_NAME}
+    PUBLIC
+    protobuf::libprotobuf


(For a separate PR that could land before or after this, not needed in this PR)

Fun fun, this increases the build time from 2 minutes (logs here) to 8 minutes (logs here). From 451 build targets to 1035. That's still fast enough to not be too concerned, but we can do better if it starts to impact development iteration time.

We could enable ccache to help a bit there. I'd probably add https://github.com/hendrikmuhs/ccache-action to the workflow, like here:

iree/.github/workflows/ci.yml

Lines 96 to 116 in d48071d

- name: ccache

uses: hendrikmuhs/ccache-action@ed74d11c0b343532753ecead8a951bb09bb34bc9 # v1.2.14

with:

key: ${{ github.job }}-${{ matrix.name }}

save: ${{ needs.setup.outputs.write-caches == 1 }}

- name: CMake - configure

run: |

cmake \

-G Ninja \

-B ${BUILD_DIR} \

-DCMAKE_BUILD_TYPE=RelWithDebInfo \

-DCMAKE_C_COMPILER_LAUNCHER=ccache \

-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \

-DIREE_BUILD_COMPILER=OFF \

-DIREE_BUILD_PYTHON_BINDINGS=ON \

-DIREE_BUILD_SAMPLES=ON \

-DIREE_ENABLE_LLD=ON \

-DIREE_ENABLE_ASSERTIONS=ON \

${{matrix.driver-options}}

- name: CMake - build

run: cmake --build ${BUILD_DIR} -- -k 0

and set these environment variables:

export CMAKE_C_COMPILER_LAUNCHER=ccache export CMAKE_CXX_COMPILER_LAUNCHER=ccache

that might just work...

Yeah I also noticed that. It includes about 100 targets to build the protoc compiler for C++ codegen and 400+ targets to build protobuf library and its deps (absl, protobuf-lite..). More than the original total number of jobs. I'll try ccache in the future : )

Signed-off-by: PragmaTwice <[email protected]>

Co-authored-by: Scott Todd <[email protected]> Signed-off-by: PragmaTwice <[email protected]>

Signed-off-by: PragmaTwice <[email protected]>

As discussed in #19418 (comment), #19418 (review) and #19418 (comment), here we support to read `env_option_overrides` as IREE compile flags from `compile_options` passed by frontends like JAX in a per-compilation basis. Most of these code already exists but has been commented due to some problems: `compile_options` was not yet available in that time, but it's now introduced by #19369. A simple use case is shown below, also as a test case: https://github.com/iree-org/iree/blob/c37a80212dd4a541762fc9fdaaa615b6d0a62829/integrations/pjrt/test/test_compile_options.py#L9-L15 ci-exactly: build_packages, test_pjrt --------- Signed-off-by: PragmaTwice <[email protected]> Co-authored-by: Scott Todd <[email protected]>

PragmaTwice force-pushed the pjrt-device-list branch from adf9e55 to 7360ae1 Compare December 4, 2024 14:53

PragmaTwice commented Dec 4, 2024

View reviewed changes

integrations/pjrt/third_party/pjrt_c_api/CMakeLists.txt Outdated Show resolved Hide resolved

PragmaTwice force-pushed the pjrt-device-list branch 2 times, most recently from 6a2d394 to 3ae5fd7 Compare December 5, 2024 13:07

PragmaTwice marked this pull request as ready for review December 5, 2024 13:29

PragmaTwice requested review from benvanik and stellaraccident as code owners December 5, 2024 13:29

PragmaTwice requested a review from ScottTodd December 5, 2024 13:29

ScottTodd reviewed Dec 5, 2024

View reviewed changes

PragmaTwice and others added 7 commits December 6, 2024 10:42

Pick original source files from XLA

cec5a8f

Signed-off-by: PragmaTwice <[email protected]>

Remove some fields from compile_options.proto

3935316

Signed-off-by: PragmaTwice <[email protected]>

Fix device list in PJRT plugin for multiple GPUs

389d473

Signed-off-by: PragmaTwice <[email protected]>

Replace find_package by FetchContent

be0254b

Signed-off-by: PragmaTwice <[email protected]>

Remove unrelated changes

1e9d8ad

Signed-off-by: PragmaTwice <[email protected]>

Apply suggestions from code review

9a9a35c

Co-authored-by: Scott Todd <[email protected]> Signed-off-by: PragmaTwice <[email protected]>

Rename protobuf_cc_library to iree_pjrt_protobuf_cc_library

b04921b

Signed-off-by: PragmaTwice <[email protected]>

PragmaTwice force-pushed the pjrt-device-list branch from cd5fbab to b04921b Compare December 6, 2024 02:50

ScottTodd self-requested a review December 6, 2024 21:48

ScottTodd approved these changes Dec 6, 2024

View reviewed changes

ScottTodd merged commit d88d0a7 into iree-org:main Dec 6, 2024
26 checks passed

This was referenced Dec 7, 2024

Enable rocm and vulkan build in CI workflow for PJRT plugin #19279

Draft

[PJRT] Add support of passing per-compilation compile options #19438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix device list of loaded executable in PJRT plugin for multiple GPUs #19369

Fix device list of loaded executable in PJRT plugin for multiple GPUs #19369

PragmaTwice commented Dec 4, 2024 •

edited

Loading

PragmaTwice commented Dec 5, 2024 •

edited

Loading

ScottTodd left a comment

ScottTodd Dec 5, 2024

PragmaTwice Dec 6, 2024

ScottTodd Dec 5, 2024

PragmaTwice Dec 6, 2024 •

edited

Loading

	- name: ccache
	uses: hendrikmuhs/ccache-action@ed74d11c0b343532753ecead8a951bb09bb34bc9 # v1.2.14
	with:
	key: ${{ github.job }}-${{ matrix.name }}
	save: ${{ needs.setup.outputs.write-caches == 1 }}
	- name: CMake - configure
	run: \|
	cmake \
	-G Ninja \
	-B ${BUILD_DIR} \
	-DCMAKE_BUILD_TYPE=RelWithDebInfo \
	-DCMAKE_C_COMPILER_LAUNCHER=ccache \
	-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
	-DIREE_BUILD_COMPILER=OFF \
	-DIREE_BUILD_PYTHON_BINDINGS=ON \
	-DIREE_BUILD_SAMPLES=ON \
	-DIREE_ENABLE_LLD=ON \
	-DIREE_ENABLE_ASSERTIONS=ON \
	${{matrix.driver-options}}
	- name: CMake - build
	run: cmake --build ${BUILD_DIR} -- -k 0

Fix device list of loaded executable in PJRT plugin for multiple GPUs #19369

Fix device list of loaded executable in PJRT plugin for multiple GPUs #19369

Conversation

PragmaTwice commented Dec 4, 2024 • edited Loading

PragmaTwice commented Dec 5, 2024 • edited Loading

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd Dec 5, 2024

Choose a reason for hiding this comment

PragmaTwice Dec 6, 2024

Choose a reason for hiding this comment

ScottTodd Dec 5, 2024

Choose a reason for hiding this comment

PragmaTwice Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

PragmaTwice commented Dec 4, 2024 •

edited

Loading

PragmaTwice commented Dec 5, 2024 •

edited

Loading

PragmaTwice Dec 6, 2024 •

edited

Loading