-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dllehr tbe forward #24
Conversation
@@ -254,10 +253,13 @@ __device__ void split_tbe_forward_hip_kernel( | |||
if(bag_id >= batch) | |||
return ; | |||
int lane_id = threadIdx.x & (AMDGCN_WAVE_SIZE - 1); | |||
|
|||
int num_loads = (emb_dim + AMDGCN_WAVE_SIZE - 1) >> 6; | |||
if( lane_id * num_loads > emb_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is not enough. If we're <192 we're reading 3 columns per thread. which means we'll have an uneven read at the end of a row. This may require load_row_per_warp and others, to only do partial reads for threads at the end of a warp.
Additional issue, 64, 128, and 256 use 1, 2, 4 reads per thread respectively...which we can easily sort out, but that 3 reads causes us to not be aligned.
@@ -0,0 +1,467 @@ | |||
/******************************************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a copy of the original split_tbe_fwd_hip.cpp file. I attempted to start removing constexpr's and using emb_dim for dword_per_row. I think this will need to be adjusted no matter what decision we make, but I'm still reviewing it.
The current implementation causes memory faults when running with kernels != to 64/128/64/256 etc. but any attempts to add printf or other debugging techniques cause the llvm compiler to take seg faults. Not sure if folks have seen this before? |
Ifu main 2022 04 20 targeting on upstream 19c17b3
No description provided.