-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropout: make seed and states_num kernel arguments #2277
Conversation
@atamazov could you help to take a look? |
@JehandadKhan sorry, called |
@tbennun The CL file needs to formatted as well. |
@JehandadKhan Done now. I thought I had formatted both. |
std::to_string(sizeof(prngStates)) + "x" + | ||
std::to_string(rng_mode) + "x" + std::to_string(prng_seed) + "x" + | ||
std::to_string(wk_grp_num); | ||
std::string network_config = "initprngs-" + std::to_string(sizeof(prngStates)) + "x" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use sdt::ostringstream
instead of concatenations if there are more than two strings. It's both faster and easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I would do this, but the rest of the file (as well as other files such as softmaxocl.cpp
) use this style of code. Would it be ok to not match the style of the rest of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise it would never really change.
src/ocl/dropoutocl.cpp
Outdated
|
||
auto&& kernels = handle.GetKernels(kernel_name, network_config); | ||
if(!kernels.empty()) | ||
{ | ||
kernels.front()(prng_states); | ||
kernels.front()(prng_states, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe prng_seed
and states_num
should be declared as unsigned
? That would make calls single lines and remove the casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (through a different way). Changing prng_seed to be unsigned
would change the MIOpen C API (in which miopenSetDropoutDescriptor
uses unsigned long long
). I'd rather pass in an unsigned long long value, which I now do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have missed that. That's good now.
@tbennun could you resolve the review opinions above? We will try to get this merged. |
@junliume I now addressed the two comments, see above and new commit. |
* 3D group forward convolution solver (#2286) * [HotFix] Fix build issue after #2286 (#2328) * Fix ConvCkIgemmFwdV6r1DlopsNchw solver to reflect that it's not dynamic (#2325) * Remove target IDs from kdb entries (#2309) * Dropout: make seed and states_num kernel arguments (#2277) * [MI250] Adding missing kernel objects (#2329) * Post-merge fixups: Replace environment variable check with problem config check and reduce lambda capture for Invoker obj (#2305) * [HotFix][CI] fix HIP tidy issue from #2277 (#2335) * [HotFix] Update requirements.txt MLIR ignore PATH for Python conda LLD (#2324) * [NFC] Replace long integers with int64_t and size_t for better compatibility with Windows (#2323) * Windows compatibility: replace long integers with int64_t and size_t, replace uint with unsigned int, replace long with long long for numbers, proper casting * Fix formatting * Fix 3d group forward convolution * Resolve review comments * Fix formatting --------- Co-authored-by: Daming Feng <[email protected]> Co-authored-by: Evgenii Averin <[email protected]> Co-authored-by: JD <[email protected]> Co-authored-by: Tal Ben-Nun <[email protected]> Co-authored-by: amberhassaan <[email protected]> Co-authored-by: Jun Liu <[email protected]>
@atamazov could you comment on the user's question?
CC: @zjing14 |
@atamazov This refers to the two included files in the Dropout kernel, which initialize large arrays in constant memory: |
@tbennun @junliume AFAICS the parts of these matrices are copied to the private memory for a purpose, i.e. each workitem modifies its local copy. If we want to put those into global memory, then we need to allocate some workspace (enough to hold data from matrices) and then manually manage it, which doesn't seem easy to me. But why we may need to save space in private segment? Are we hitting the limit? /cc @zjing14 |
@atamazov yes we are hitting a limit on constant memory, with this fix it "gives a reduction in initialization time of a Transformer model from 10:22 minutes to 31 seconds." We should take a look at the OCL code to see if we can work on heap memory instead. |
PRNG state requires some workspace in any case, but it supposed to be tiny, but it should be double-checked. But to be sure we need to re-investigate our GPU PRNG implementation. |
@CAHEK7 Thanks for feedback. Now we at least better understand what to begin with. |
Every time a dropout descriptor is created (even if it's with a different seed), the initialization kernel is recompiled. This PR turns the number of states to initialize and random seed into runtime parameters. There seems to be no effect on the runtime of the initialization kernel (which I'm assuming is not performance critical), but fewer JIT compilation calls.
Note: I tried setting
prng_seed
to be anunsigned long long
parameter, but that passed a wrong value onto the kernel itself.