Add API for non-caching-load #1289

ranapratap55 · 2024-08-09T09:26:03Z

RCCL provides "low latency" protocols for communication between agents, where the entire message consisting of data and flags is packed into a single L2 cache line. This is usually accomplished using atomic relaxed instructions in LLVM. But the 128-byte version of this protocol (LL-128) requires 128-bit load or store instructions that bypass the cache and are not broken up into multiple instructions. The nontemporal builtin is not always suitable for this use case.

The proposed approach is to provide a C++ function template that encapsulates an inline assembly call. This asm is intended to use the appropriate load/store parameters for each combination of data size and architecture.

AlexVlx · 2024-08-09T14:52:37Z

Have you actually verified that the byte two-byte load instructions you are using exist on the ISAs you expect them to exist? If you have, perhaps you want to check again, carefully, if they exist on GFX9 and GFX10? Has this been tested at all? Should it not have some unit tests in tow?

ranapratap55 · 2024-10-08T05:26:59Z

Have you actually verified that the byte two-byte load instructions you are using exist on the ISAs you expect them to exist? If you have, perhaps you want to check again, carefully, if they exist on GFX9 and GFX10? Has this been tested at all? Should it not have some unit tests in tow?

Updated the patch with byte, 2-byte load and added test cases.

ranapratap55 · 2024-10-15T06:10:05Z

ping.

wenkaidu · 2024-10-15T15:38:47Z

ping.

Can you try to integrate this into https://github.com/ROCm/rccl/blob/develop/tools/p2p-latency-test/ll_latency_test.cpp?

ranapratap55 requested review from wenkaidu, gilbertlee-amd, akolliasAMD, edgargabriel, PedramAlizadeh, nusislam, nileshnegi, KawtharShafie, AtlantaPepsi, mberenjk, corey-derochie-amd and haripriya-amd as code owners August 9, 2024 09:26

ranapratap55 force-pushed the ranapratap55/non-caching-load branch 2 times, most recently from 74cc26c to a52284b Compare October 8, 2024 05:20

ranapratap55 force-pushed the ranapratap55/non-caching-load branch 2 times, most recently from 81910cd to 6a5370a Compare October 8, 2024 05:30

Add API for non-caching-load

24372dc

ranapratap55 force-pushed the ranapratap55/non-caching-load branch from 6a5370a to 24372dc Compare October 15, 2024 05:18

wenkaidu approved these changes Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API for non-caching-load #1289

Add API for non-caching-load #1289

ranapratap55 commented Aug 9, 2024

AlexVlx commented Aug 9, 2024

ranapratap55 commented Oct 8, 2024

ranapratap55 commented Oct 15, 2024

wenkaidu commented Oct 15, 2024

Add API for non-caching-load #1289

Are you sure you want to change the base?

Add API for non-caching-load #1289

Conversation

ranapratap55 commented Aug 9, 2024

AlexVlx commented Aug 9, 2024

ranapratap55 commented Oct 8, 2024

ranapratap55 commented Oct 15, 2024

wenkaidu commented Oct 15, 2024