Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 241210 #2783

Merged
merged 1,426 commits into from
Jan 7, 2025
Merged
Changes from 1 commit
Commits
Show all changes
1426 commits
Select commit Hold shift + click to select a range
36a0ea8
Introduce xla_gpu flag for Dumping HloUnoptimizedSnapshots
Dec 5, 2024
fa13d03
Automated Code Change
tensorflower-gardener Dec 5, 2024
f1354b7
Update GraphDef version to 2067.
tensorflower-gardener Dec 5, 2024
d037b7c
compat: Update forward compatibility horizon to 2024-12-05
tensorflower-gardener Dec 5, 2024
db7cabf
Fix typos in documentation strings
Venkat6871 Dec 5, 2024
1e777c7
[xla:collectives] NFC: Move all NCCL collectives to Collectives API
ezhulenev Dec 5, 2024
687634c
Automated Code Change
tensorflower-gardener Dec 5, 2024
b2a657c
Merge pull request #82141 from sandeepgupta12:master
tensorflower-gardener Dec 5, 2024
0ba5e64
Automated Code Change
tensorflower-gardener Dec 5, 2024
fe29bc6
Reverts f13c44149dba9518dc7f93b85342bcf72e5580fd
thomasjoerg Dec 5, 2024
3e1162e
Add support for bitcasts that add a degenerate majormost dimension wh…
tensorflower-gardener Dec 5, 2024
205b42c
Automated Code Change
tensorflower-gardener Dec 5, 2024
19ad44a
Update the TODO bug number in xla_triton_sparse_passes.cc.
tensorflower-gardener Dec 5, 2024
d437297
PR #19913: [ROCm] Do not use fast approximation for exp and log
mmakevic-amd Dec 5, 2024
cb985d0
[XLA:CPU] Implement 2D custom algorithm for strided transposed convol…
Adam-Banas Dec 5, 2024
728087a
Integrate LLVM at llvm/llvm-project@71ac1eb50955
gribozavr Dec 5, 2024
9fae081
Automated Code Change
tensorflower-gardener Dec 5, 2024
52dede2
PR #20153: Delete Exchange of Collectives and Dequantization in GEMM …
philipphack Dec 5, 2024
d63ae75
Automated Code Change
tensorflower-gardener Dec 5, 2024
fc996d0
PR #18616: [XLA:CPU][oneDNN] Refactor code that fuses Add operation w…
akhilgoe Dec 5, 2024
ebb9b42
Automated Code Change
tensorflower-gardener Dec 5, 2024
2e39667
[XLA:GPU] Make VLOG explanation in SortRewriter more helpful.
thomasjoerg Dec 5, 2024
4c3aa84
Automated Code Change
tensorflower-gardener Dec 5, 2024
d6ccc6a
Adding dumping functionality for HloUnoptimizedSnapshot.
Dec 5, 2024
1c3c327
[XLA:GPU][Emitters] Create `xla_ops` dialect for the platform-indepen…
pifon2a Dec 5, 2024
ed9b19c
Fix an initialization order bug in se_gpu_pjrt_compiler.
beckerhe Dec 5, 2024
f857329
Add support for Pad operation in Hhost_offload_utils::GetPredecessors().
tensorflower-gardener Dec 5, 2024
f7e1750
[XLA:GPU] Extend atomic_rmw to support vector updates.
pifon2a Dec 5, 2024
9ac393e
Integrate LLVM at llvm/llvm-project@dd7a3d4d798e
gribozavr Dec 5, 2024
ab5ec27
Reverts fd9471e7d48e8e86684c847c0e1897c76e737805
Moerafaat Dec 5, 2024
4501dc7
[xla-auto-sharding] Add SolveRandom() baseline algorithm for a random…
Dec 5, 2024
9b947fb
Enable explicit batch dims of gather/scatter operations in GSPMD. The…
ZixuanJiang Dec 5, 2024
c8da09e
Add public API to load model from file or buffer
tensorflower-gardener Dec 5, 2024
29fb179
Remove if_oss from B100 references
IllogicalMoose Dec 5, 2024
96b58a3
Change XlaCallModuleLoader to take module string by reference, avoid …
GleasonK Dec 5, 2024
b043999
[XLA:CPU] Add method to benchmark compile times
WillFroom Dec 5, 2024
e2235da
[xla:gpu] Update custom call config WARN to VLOG
ezhulenev Dec 5, 2024
a09f325
Add array interoperability python bindings to `xla::Literal`
WillFroom Dec 5, 2024
d107b6d
Avoid using getCurrentVersion when re-serializing XlaCallModule ops
GleasonK Dec 5, 2024
8c16959
[XLA:CPU] Enable Convert test for F8E3M4
tvladyslav Dec 5, 2024
0ff0daf
[JAX] Add end-to-end execution support in colocated Python API
hyeontaek Dec 5, 2024
5a9b7af
MHLO defns for a ragged dot that permits ragged batch and contraction.
pravnar Dec 5, 2024
157b627
Internal change only
SiqiaoWu1993 Dec 5, 2024
23c19d7
Make the following tensorflow python targets visible publicly for LiteRT
ecalubaquib Dec 5, 2024
97509fb
[xla:collectives] NFC: Delete unused ScopedPlanAllocator
ezhulenev Dec 5, 2024
5855d96
Add LiteRT CompiledModel API
terryheo Dec 5, 2024
fb3c02c
IFRT Proxy: Make `Executable::Delete()` and most `::Execute` async.
tensorflower-gardener Dec 5, 2024
534ecb6
Integrate LLVM at llvm/llvm-project@fdb90cef75ca
durin42 Dec 5, 2024
7fbf0f7
Make the targets :tensor_testutil, :fake_input, :jpeg_internal, :stat…
ecalubaquib Dec 5, 2024
f0c07ce
Add flatten_conditional and conditional_value to HloControlFlowFlatte…
tensorflower-gardener Dec 5, 2024
712dc9d
[hlo-opt] Register gpu passes and add "--list-passes" option
abhigunj Dec 5, 2024
660ed52
Update TFRT dependency to use revision
tensorflower-gardener Dec 5, 2024
da67bab
Remove #if GOOGLE_CUDA from matmul_utils.cc.
klucke Dec 5, 2024
a9a236c
Only return the valid part of PCI Bus ID when constructing the C++ st…
tensorflower-gardener Dec 5, 2024
f635bb7
Reverts 5c36415e4bafd2ef7ed219fae423d3b79ba017df
tensorflower-gardener Dec 5, 2024
ed7496a
Use default HloParserOptions in HLO runner.
frgossen Dec 5, 2024
0956278
Remove #if TENSORFLOW_USE_ROCM from gpu_executable.cc.
klucke Dec 5, 2024
3bbe458
Remove TENSORFLOW_USE_ROCM #ifdefs in command_buffer_thunk_test.cc.
klucke Dec 5, 2024
db05a09
Correct test for force snappy compression.
tensorflower-gardener Dec 5, 2024
a36981a
Remove obsolete TODOs and the ones associated with closed bug in thi…
toli-y Dec 5, 2024
6f12084
Add minimal test case to collective pipeliner
frgossen Dec 6, 2024
ae81cca
Add wrapper for PM Sampling metrics to be added from std::vector<PmSa…
tensorflower-gardener Dec 6, 2024
93e6e0b
[xla:collectives] Make NcclApi an alias for GpuCollectives
ezhulenev Dec 6, 2024
07e878d
Remove nsync from TensorFlow
majnemer Dec 6, 2024
cbb6242
Integrate LLVM at llvm/llvm-project@698d83218565
durin42 Dec 6, 2024
0117e22
[Cleanup] Use absl::StrCat
frgossen Dec 6, 2024
7a89ffa
[xla] Add detailed tracing to RendezvousSingle
ezhulenev Dec 6, 2024
088a25d
[Cleanup] Use push_back instead of emplace_back where appropriate
frgossen Dec 6, 2024
442fe5b
Correct the order of arguments in comment for RaggedAllToAll.
tensorflower-gardener Dec 6, 2024
6872820
Demote log from ERROR to VLOG(2).
SandSnip3r Dec 6, 2024
9ae2777
Added interface "MatchTrivialLoopRange" to while_loop_analysis to get…
fhoushmand Dec 6, 2024
9a4c815
Correct the definition of tfl.softmax
oToToT Dec 6, 2024
92698ba
[xla:gpu] NFC: Replace all users of NcclApi with GpuCollectives
ezhulenev Dec 6, 2024
8e88ce1
Update nanobind dependency to v2.4.0.
hawkinsp Dec 6, 2024
1e9c8d7
Automated Code Change
tensorflower-gardener Dec 6, 2024
4fbd265
Automated Code Change
tensorflower-gardener Dec 6, 2024
a1ea273
Automated Code Change
tensorflower-gardener Dec 6, 2024
96aade1
[XLA] Support splitting ragged all-to-all into async start and done.
tensorflower-gardener Dec 6, 2024
467299a
[XLA:LatencyHidingScheduler] Do not allow target-defined resources to…
seherellis Dec 6, 2024
b30c903
Integrate Triton up to [a69ebfaa](https://github.com/openai/triton/co…
gflegar Dec 6, 2024
785a085
Automated Code Change
tensorflower-gardener Dec 6, 2024
c82b7cf
Automated Code Change
tensorflower-gardener Dec 6, 2024
474e687
Update GraphDef version to 2068.
tensorflower-gardener Dec 6, 2024
a392af9
compat: Update forward compatibility horizon to 2024-12-06
tensorflower-gardener Dec 6, 2024
53c729b
Reverts d72c8f988db6072dd00d74cc28d3261eadbfeee7
akuegel Dec 6, 2024
a5b1537
Automated Code Change
tensorflower-gardener Dec 6, 2024
8202291
Automated Code Change
tensorflower-gardener Dec 6, 2024
073ca1a
[xla:ffi] Add num_threads() API to external FFI thread pool
ezhulenev Dec 6, 2024
7776d33
[XLA:GPU] Add a test for RaggedAllToAll that runs on 8 GPUs.
olegshyshkov Dec 6, 2024
ba2f046
Integrate LLVM at llvm/llvm-project@2ccf7ed277df
gribozavr Dec 6, 2024
cdcf5c4
Exclude more broken tilings from Triton exhaustive autotuning
beckerhe Dec 6, 2024
ac0fcde
Fix type stub for register_node to note that to_iterable_with_keys is…
hawkinsp Dec 6, 2024
c4c444a
[xla-auto-sharding] Generalize multilevel flag to heuristic solver op…
Dec 6, 2024
c2cc1de
[XLA:GPU] Drop unnecessary bitcast from the chain convert(s4)->bitcas…
loislo Dec 6, 2024
9bb8339
PR #19669: Replace custom free-threading flag by rules_python is_py_f…
vfdev-5 Dec 6, 2024
7094f3e
[Cleanup] Use HloPredicateIs(Not)Op
frgossen Dec 6, 2024
899b1e1
[StableHLO] Refactor XlaCallModule to use more upstream StableHLO mac…
GleasonK Dec 6, 2024
b6e5323
[Cleanup] Use HloPredicateIs(Not)Op
frgossen Dec 6, 2024
6c46f46
Re-enable current StableHLO current version attribute in PJRT
GleasonK Dec 6, 2024
f705ef4
Allow programatic override of the default values for the gcs file sys…
Nicop06 Dec 6, 2024
12f3177
#sdy Fix MHLO<->HLO translation bug with multi result host offload fu…
bartchr808 Dec 6, 2024
c33f651
Merge pull request #82288 from tensorflow:fixtypos09
tensorflower-gardener Dec 6, 2024
1a7aafa
[Cleanup] Use HloPredicateIs(Not)Op to unify opcode checking across XLA
frgossen Dec 6, 2024
3290fe0
Remove the use of GOOGLE_CUDA and TENSORFLOW_USE_ROCM in buffer_compa…
klucke Dec 6, 2024
2da8686
[XLA] Move the scheduling annotation from fused instruction to the ca…
seherellis Dec 6, 2024
bcd2b16
[xla-auto-sharding] Add SolveGreedy() heuristic to make local greedy …
Dec 6, 2024
2f9738c
Integrate StableHLO at openxla/stablehlo@b3d3cacd
abhigunj Dec 6, 2024
09de5c8
[tsl] Deprecate tsl::mutex and tsl::condition_variable, make tsl::Con…
majnemer Dec 6, 2024
77c71d5
Remove the use of TENSORFLOW_USE_ROCM from convolution_thunk.cc.
klucke Dec 6, 2024
cd7d55b
[Cleanup] Use HloPredicateIs(Not)Op
frgossen Dec 6, 2024
3a363f6
hlo_original_value: Don't blow up when printing empty values.
pizzud Dec 6, 2024
e7326af
Remove UpgradeLegacyGraph wrapper to FunctionalizeControlFlow. Only g…
rocketas Dec 6, 2024
278f7e5
Fixes an input signature display issue in loaded TF1 SavedModels, and…
wangpengmit Dec 6, 2024
5643043
[Cleanup] Do not std::move on return
frgossen Dec 6, 2024
f1fe201
[XLA:GPU:ROCm] Restore threads per warp behavior
majnemer Dec 6, 2024
5d80ff5
[FuncGraph] Micro-optimize `as_default()` method.
mrry Dec 6, 2024
b034450
Add method for HloRunnerAgnosticTestBase implementations to preproces…
nvgrw Dec 6, 2024
948c10d
Allow platform-specific relaxation of fusion restrictions on in-place…
tensorflower-gardener Dec 6, 2024
7569116
Add JIT compiler plugin support and test case for QC
tensorflower-gardener Dec 6, 2024
740d9af
Add option for `HloRunnerPjRt` to forward parameter layout for on dev…
nvgrw Dec 6, 2024
3a93a59
Add ability to disable TargetConfig metadata for se_gpu_pjrt_client.
pschuh Dec 6, 2024
37682c9
Migrate runtime_client from using deprecated ConvertFunctionToMlir. F…
rocketas Dec 6, 2024
104b124
Allow enabling cl_khr_command_buffer in RestoreDeserialized
oToToT Dec 6, 2024
640505b
[xla:cpu] Add xnnpack dependency to xla:cpu runtime
ezhulenev Dec 7, 2024
e597532
Add a public UpdateEntryComputationLayout method
Dec 7, 2024
1391ddf
Update LiteRT Model API
terryheo Dec 7, 2024
af46a3d
Delete HloUnaryInstruction used in CreateUnary. Instead make result_a…
hanrach9 Dec 7, 2024
3ab9024
Frontend for Inference Profile.
cliveverghese Dec 7, 2024
4c65cd1
Fix the heuristic for extending events in derived timeline to have a …
tensorflower-gardener Dec 7, 2024
fcbab2b
Automated Code Change
tensorflower-gardener Dec 7, 2024
90021f3
Automated Code Change
tensorflower-gardener Dec 7, 2024
7cd13b1
Enable sampling for inference profile and expose them in inference pr…
cliveverghese Dec 7, 2024
f390239
Automated Code Change
tensorflower-gardener Dec 7, 2024
c4fe609
Update GraphDef version to 2069.
tensorflower-gardener Dec 7, 2024
55c5273
compat: Update forward compatibility horizon to 2024-12-07
tensorflower-gardener Dec 7, 2024
c275e34
Automated Code Change
tensorflower-gardener Dec 7, 2024
8940d75
Automated Code Change
tensorflower-gardener Dec 7, 2024
4e3c0d4
Automated Code Change
tensorflower-gardener Dec 7, 2024
7e9bd9b
Merge pull request #64651 from redwrasse:redwrasse/multinomial-op-log…
tensorflower-gardener Dec 8, 2024
ff20f87
Add OverviewInferenceLatency to OverviewPage.
cliveverghese Dec 8, 2024
e45ac6c
Automated Code Change
tensorflower-gardener Dec 8, 2024
100f983
Defensively handle invalid device properties.
tensorflower-gardener Dec 8, 2024
f01a619
compat: Update forward compatibility horizon to 2024-12-08
tensorflower-gardener Dec 8, 2024
6122b63
Update GraphDef version to 2070.
tensorflower-gardener Dec 8, 2024
04d40d7
Example is added to tf.math.truediv function
LakshmiKalaKadali Dec 8, 2024
d8630e5
Update math_ops.py
LakshmiKalaKadali Dec 8, 2024
441335e
Automated Code Change
tensorflower-gardener Dec 8, 2024
e123878
Automated Code Change
tensorflower-gardener Dec 8, 2024
2d41591
Automated Code Change
tensorflower-gardener Dec 9, 2024
2618155
Update math_ops.py
LakshmiKalaKadali Dec 9, 2024
172a438
Automated Code Change
tensorflower-gardener Dec 9, 2024
8eaded7
Update GraphDef version to 2071.
tensorflower-gardener Dec 9, 2024
0e83df7
Merge pull request #82286 from linux-on-ibm-z:s390x_fix
tensorflower-gardener Dec 9, 2024
931d10f
PR #20294: [cuBLAS] Relax test error margin for int4 dot
sergey-kozub Dec 9, 2024
442bc39
Reverts a27025f94345d873bc9e4718b4afc45651ea2db2
beckerhe Dec 9, 2024
c1dba70
Remove barely used `Shape::Swap` method
toli-y Dec 9, 2024
ac56327
Update math_ops.py
LakshmiKalaKadali Dec 9, 2024
318d29e
[xla] Update warnings.bazelrc
penpornk Dec 9, 2024
a4c24fc
PR #20241: Updated Typo's in multiple documents
kiransair Dec 9, 2024
5bcaafc
PR #18989: [AllGatherCSE] Add a pass that CSEs all-gathers on paramet…
patrick-toulme Dec 9, 2024
15da29a
Automated Code Change
tensorflower-gardener Dec 9, 2024
2dfdb24
compat: Update forward compatibility horizon to 2024-12-09
tensorflower-gardener Dec 9, 2024
ffbb2ab
Automated Code Change
tensorflower-gardener Dec 9, 2024
73057e2
[XLA:GPU] Use `absl::Microseconds` instead of doing duration arithmetic.
allanrenucci Dec 9, 2024
78b5a06
Reland of PR #19571. Fix test FunctionalHloRunnerTest.ShardedAutotuni…
tensorflower-gardener Dec 9, 2024
5155ffc
PR #19451: Setting xla_gpu_multi_streamed_windowed_einsum to true by …
Tixxx Dec 9, 2024
73e1e9f
[XLA:Python] Use nanobind::isinstance from upstream nanobind, delete …
hawkinsp Dec 9, 2024
fb8ff76
[XLA:CPU] Add a Python extension for KernelRunner.
WillFroom Dec 9, 2024
34b2be7
Integrate LLVM at llvm/llvm-project@1d95825d4d16
tensorflower-gardener Dec 9, 2024
67c1c99
[xla:gpu] Removed redundant parameter from `CompileTritonToLLVM`
superbobry Dec 9, 2024
521421e
Add support for CUDA 12.6.3 and CUDNN 9.5.1/9.6.0.
tensorflower-gardener Dec 9, 2024
8b70b81
Add LiteRtCompilationOptions to CompiledModel API
terryheo Dec 9, 2024
7372896
[XLA:GPU] Remove unused `xla_experimental_exec_time_optimization_effo…
allanrenucci Dec 9, 2024
afe043f
Clarify deletion timeline of tf.lite.Interpreter in TF 2.19.0 release…
pak-laura Dec 9, 2024
bd7258c
[numpy] Fix test failures under NumPy 2.2.
hawkinsp Dec 9, 2024
80ce1da
Move legacyfedinput code out of import_model.
rocketas Dec 9, 2024
402833c
Handle the case where there are multiple levels of formatting ops bet…
tensorflower-gardener Dec 9, 2024
183830f
Split up cusolver_context into CUDA-specific and ROCM-specific parts.
klucke Dec 9, 2024
8b5b984
Add some comments for copy insertion.
tensorflower-gardener Dec 9, 2024
bddec75
[Cleanup] Use HloPredicateIs(Not)Op
frgossen Dec 9, 2024
e169e90
Debugging code cleanup?
tensorflower-gardener Dec 9, 2024
ea1d3bc
Add CPU specific passes for hlo-opt tool.
hanrach9 Dec 9, 2024
04789de
Stop using DISABLED_ON_GPU_ROCM for GPU tests, and instead just use G…
klucke Dec 9, 2024
6f65d50
Internal changes only
SiqiaoWu1993 Dec 9, 2024
1587442
Modifies block_rematerialization_factor to be a float, increasing gra…
tensorflower-gardener Dec 9, 2024
60fc1a5
Add inference_latency_chart to tensorboard
cliveverghese Dec 9, 2024
c2c1f02
Modify XlaOp Exp to accept result accuracy as an argument. We want to…
hanrach9 Dec 9, 2024
72db4ac
Add StepEvents required for Inference Profiles on GPU.
cliveverghese Dec 9, 2024
a3ec1cc
[XLA] Fix latency hiding scheduler when faced with annotated no-op in…
seherellis Dec 9, 2024
508565d
IFRT proxy: Add profiler spans to all entrypoints at the client.
tensorflower-gardener Dec 9, 2024
c52225f
Create OpStatsToRooflineModel, in preparation of Roofline Model creation
zzzaries Dec 10, 2024
68cb691
When generating fake arguments for running HLOs via hlo_runner, use a…
tensorflower-gardener Dec 10, 2024
2c98517
Register WhileLoopAllReduceCodeMotion pass to the opt tool
abhigunj Dec 10, 2024
4d4b910
Add RegisterMlirToHloDependentDialects to register required dependent…
abhigunj Dec 10, 2024
2ebc1e4
Add a new test base class for a default PjRt test runner w/ SE interp…
nvgrw Dec 10, 2024
be58e4b
Add an interpreter PjRt client registry for testing.
nvgrw Dec 10, 2024
30f8c58
[XLA] Don't bail when encountering complex loop pipelining patterns
vsytch Dec 10, 2024
5e778d6
Automated Code Change
tensorflower-gardener Dec 10, 2024
e4d1135
Automated Code Change
tensorflower-gardener Dec 10, 2024
f000550
Automated Code Change
tensorflower-gardener Dec 10, 2024
36f53e5
Automated Code Change
tensorflower-gardener Dec 10, 2024
41fe8f6
Automated Code Change
tensorflower-gardener Dec 10, 2024
c9afdf2
Merge pull request #82466 from tensorflow:LakshmiKalaKadali-patch-7
tensorflower-gardener Dec 10, 2024
1618bb8
Automated Code Change
tensorflower-gardener Dec 10, 2024
448bc73
Replace std::string_view with absl::string_view
tensorflower-gardener Dec 10, 2024
634c24a
Automated Code Change
tensorflower-gardener Dec 10, 2024
9e664bb
Annotate Int64 value with jstype=JS_STRING to represent serialized Ja…
tensorflower-gardener Dec 10, 2024
451bb67
Automated Code Change
tensorflower-gardener Dec 10, 2024
fc7cd42
Automated Code Change
tensorflower-gardener Dec 10, 2024
33a8cd5
Automated Code Change
tensorflower-gardener Dec 10, 2024
cbd9e99
Automated Code Change
tensorflower-gardener Dec 10, 2024
208d844
Automated Code Change
tensorflower-gardener Dec 10, 2024
2baa23e
Automated Code Change
tensorflower-gardener Dec 10, 2024
695a715
compat: Update forward compatibility horizon to 2024-12-10
tensorflower-gardener Dec 10, 2024
d877e93
Update GraphDef version to 2072.
tensorflower-gardener Dec 10, 2024
a6a39de
Automated Code Change
tensorflower-gardener Dec 10, 2024
f2bc981
Automated Code Change
tensorflower-gardener Dec 10, 2024
e27bc13
Replace std::string_view with absl::string_view
tensorflower-gardener Dec 10, 2024
2048631
Automated Code Change
tensorflower-gardener Dec 10, 2024
f270c8e
Automated Code Change
tensorflower-gardener Dec 10, 2024
c48ac21
Add debug option for failing the PTX compilation on register spilling
beckerhe Dec 10, 2024
641ca8f
[XLA:GPU] Use absl instead of tensorflow functions/types.
allanrenucci Dec 10, 2024
2b60647
PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
terryysun Dec 10, 2024
4725eda
[XLA:GPU] Disable cutlass dynamic-update-slice rewrite on V100.
pifon2a Dec 10, 2024
9863b08
Automated Code Change
tensorflower-gardener Dec 10, 2024
b6971fc
first merge commit with conflicts
cj401-amd Dec 10, 2024
d91630a
second commit after resolving merge conflicts
cj401-amd Dec 10, 2024
76f247d
commit after resolving merge conflicts
cj401-amd Dec 10, 2024
252a956
Remove unneeded triton patches
mmakevic-amd Dec 11, 2024
7ec4c63
Remove duplicate tags and unnecessary backend for triton_support_lega…
mmakevic-amd Dec 11, 2024
2464a3f
Remove duplicate targets in /xla/stream_executor/BUILD
mmakevic-amd Dec 11, 2024
8aedd80
Remove duplicate obsolete rules and targets from stream_executor/cuda
mmakevic-amd Dec 11, 2024
071eec5
Fix missing template value (ref https://github.com/openxla/xla/pull/2…
mmakevic-amd Dec 11, 2024
3a90ef0
Remove duplicated lines
mmakevic-amd Dec 11, 2024
67d8f16
Use new implementation for buffer comparison (https://github.com/tens…
mmakevic-amd Dec 11, 2024
ac4cbe0
Added missing absl:: namespace qualifiers to StatusOr
mmakevic-amd Dec 11, 2024
02686ea
update for CI building unit test
cj401-amd Dec 16, 2024
256194f
update with cuda-only for passing unit-test
cj401-amd Dec 17, 2024
e60b72b
update for unit test passed locally and ci building
cj401-amd Dec 17, 2024
7005763
update for skip command_buffer_cmd_test locally tested
cj401-amd Dec 18, 2024
fd54d81
fix for CI building
cj401-amd Dec 18, 2024
b74d59c
Properly skip topk_specializer_test for ROCm
mmakevic-amd Dec 18, 2024
72a9fb8
Remove unnecessary backend for gpu_triton_custom_call_test
mmakevic-amd Dec 18, 2024
d71ec61
fix XLA related unit tests with locally passed
cj401-amd Dec 18, 2024
f22b6c3
update for pycpp test
cj401-amd Dec 19, 2024
2277278
_pywrap_profiler_plugin seems have been messed up somewhere, as a cyc…
cj401-amd Dec 20, 2024
b1e817d
Fix dependency issues of profiler plugin
mmakevic-amd Dec 30, 2024
34e1cc7
Merge branch 'develop-upstream-sync-241210' of github.com:ROCm/tensor…
cj401-amd Jan 2, 2025
b45daeb
update for versions_test
cj401-amd Jan 5, 2025
f84523c
update profiler build
cj401-amd Jan 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove TENSORFLOW_USE_ROCM #ifdefs in command_buffer_thunk_test.cc.
PiperOrigin-RevId: 703276205
  • Loading branch information
klucke authored and tensorflower-gardener committed Dec 5, 2024
commit 3bbe4585952f90c4beb8541b8e8ec86e29eb4bbd
3 changes: 2 additions & 1 deletion third_party/xla/xla/service/gpu/runtime/BUILD
Original file line number Diff line number Diff line change
@@ -462,7 +462,6 @@ xla_test(
"gpu_b100",
"gpu_amd_any",
],
local_defines = if_cuda_is_configured(["GOOGLE_CUDA=1"]) + if_rocm_is_configured(["TENSORFLOW_USE_ROCM=1"]),
deps = [
":command_buffer_cmd",
":command_buffer_thunk",
@@ -480,12 +479,14 @@ xla_test(
"//xla/service/gpu:matmul_utils",
"//xla/stream_executor:blas",
"//xla/stream_executor:command_buffer",
"//xla/stream_executor:device_description",
"//xla/stream_executor:device_memory",
"//xla/stream_executor:device_memory_allocator",
"//xla/stream_executor:kernel",
"//xla/stream_executor:kernel_spec",
"//xla/stream_executor:platform",
"//xla/stream_executor:platform_manager",
"//xla/stream_executor:semantic_version",
"//xla/stream_executor:stream_executor_h",
"//xla/stream_executor:stream_executor_memory_allocator",
"//xla/stream_executor/gpu:gpu_test_kernels",
Original file line number Diff line number Diff line change
@@ -22,6 +22,7 @@ limitations under the License.
#include <string>
#include <thread> // NOLINT
#include <utility>
#include <variant>
#include <vector>

#include "absl/status/statusor.h"
@@ -40,6 +41,7 @@ limitations under the License.
#include "xla/shape_util.h"
#include "xla/stream_executor/blas.h"
#include "xla/stream_executor/command_buffer.h"
#include "xla/stream_executor/device_description.h"
#include "xla/stream_executor/device_memory.h"
#include "xla/stream_executor/device_memory_allocator.h"
#include "xla/stream_executor/gpu/gpu_test_kernels.h"
@@ -49,6 +51,7 @@ limitations under the License.
#include "xla/stream_executor/kernel_spec.h"
#include "xla/stream_executor/platform.h"
#include "xla/stream_executor/platform_manager.h"
#include "xla/stream_executor/semantic_version.h"
#include "xla/stream_executor/stream_executor.h"
#include "xla/stream_executor/stream_executor_memory_allocator.h"
#include "xla/tests/hlo_test_base.h"
@@ -102,13 +105,17 @@ KernelArgsPacking CreateDefaultArgsPacking() {
}

// Some of the tests rely on CUDA 12.3+ features.
bool IsAtLeastCuda12300() {
#if defined(TENSORFLOW_USE_ROCM)
return false;
#endif
#if CUDA_VERSION >= 12030
return true;
#endif
bool IsAtLeastCuda12300(const se::StreamExecutor* executor) {
const auto& device_description = executor->GetDeviceDescription();
const auto* cuda_cc = std::get_if<se::CudaComputeCapability>(
&device_description.gpu_compute_capability());
if (cuda_cc != nullptr) {
if (device_description.driver_version() >=
stream_executor::SemanticVersion(12, 3, 0)) {
return true;
}
}

return false;
}

@@ -593,12 +600,12 @@ TEST(CommandBufferThunkTest, CustomAddKernelLaunchCmd) {
}

TEST(CommandBufferThunkTest, GemmCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph tracing is not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t lhs_length = sizeof(float) * 2 * 4;
@@ -710,12 +717,12 @@ TEST(CommandBufferThunkTest, GemmCmd) {
}

TEST(CommandBufferThunkTest, DynamicSliceFusionCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph tracing is not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t lhs_length = sizeof(float) * 4 * 4;
@@ -866,12 +873,12 @@ TEST(CommandBufferThunkTest, DynamicSliceFusionCmd) {
}

TEST(CommandBufferThunkTest, CublasLtCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph tracing is not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream1, executor->CreateStream());
TF_ASSERT_OK_AND_ASSIGN(auto stream2, executor->CreateStream());

@@ -1126,12 +1133,12 @@ TEST(CommandBufferThunkTest, MultipleLaunchCmd) {
}

TEST(CommandBufferThunkTest, IfCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph conditionals are not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t length = 4;
@@ -1214,12 +1221,12 @@ TEST(CommandBufferThunkTest, IfCmd) {
}

TEST(CommandBufferThunkTest, IfElseCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph conditionals are not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t length = 4;
@@ -1307,12 +1314,12 @@ TEST(CommandBufferThunkTest, IfElseCmd) {
}

TEST(CommandBufferThunkTest, CaseCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph conditionals are not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t length = 4;
@@ -1396,12 +1403,12 @@ TEST(CommandBufferThunkTest, CaseCmd) {
}

TEST(CommandBufferThunkTest, ForCmd) {
if (!IsAtLeastCuda12300()) {
se::StreamExecutor* executor = GpuExecutor();

if (!IsAtLeastCuda12300(executor)) {
GTEST_SKIP() << "CUDA graph conditionals are not supported";
}

se::StreamExecutor* executor = GpuExecutor();

TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());

int64_t length = 4;