Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Upgrade to ROCm 4.3.1 [WORKAROUND] Disable dynamic reduction by default for HIP backend && ROCm 4.3 #1125

Merged
merged 7 commits into from
Aug 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ RUN dpkg --add-architecture i386

RUN if [ "$USE_MLIR" = "ON" ] ; \
then export ROCM_APT_VER=.apt_4.2;\
else export ROCM_APT_VER=.apt_4.3; \
else export ROCM_APT_VER=.apt_4.3.1; \
fi && \
echo $ROCM_APT_VER &&\
sh -c 'echo deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/$ROCM_APT_VER/ xenial main > /etc/apt/sources.list.d/rocm.list'
Expand Down
19 changes: 14 additions & 5 deletions src/reducetensor.cpp
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,10 @@
#include <iostream>
#include <sstream>

MIOPEN_DECLARE_ENV_VAR(MIOPEN_DISABLE_DYNAMIC_REDUCTION);
MIOPEN_DECLARE_ENV_VAR(MIOPEN_DEBUG_DYNAMIC_REDUCTION);
atamazov marked this conversation as resolved.
Show resolved Hide resolved

#define WORKAROUND_MIOPEN_ISSUE_557 1
#define WORKAROUND_ISSUE_1123 ((HIP_PACKAGE_VERSION_FLAT == 4003000000ULL) && MIOPEN_BACKEND_HIP)
Copy link
Contributor

@atamazov atamazov Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. For 4.3 HIP version is 4.3.21300, For 4.3.1 it is 4.3.21331 (according to CI logs). This must be changed to

#define WORKAROUND_ISSUE_1123 (HIP_PACKAGE_VERSION_FLAT >= 4003000000ULL && HIP_PACKAGE_VERSION_FLAT <= 4003021300ULL && MIOPEN_BACKEND_HIP)

Otherwise dynamic reduction will NOT be automatically enabled for 4.3.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume My bad; I've forgot the details of the HIP release numbering scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed it during review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIGHT NOW WORKAROUND IS NEVER ENABLED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov No problem! Testing in this draft now: #1126
Will create separate PRs for the double purposed draft.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/ci_gfx1030_ocl_to_hip/3/pipeline/11

[2021-08-30T20:31:02.411Z] 79/82 Test #80: test_reduce_double ......................... Passed 880.12 sec

This particular test is still okay but the other tests were terminated somehow. I'll keep monitoring.


namespace miopen {

Expand All @@ -59,6 +60,15 @@ enum ReductionMethod_t

namespace detail {

static bool IsDynamicReductionEnabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Recommendation] Buttery butter ;) Either static or namespace.

Suggested change
static bool IsDynamicReductionEnabled()
bool IsDynamicReductionEnabled()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Recommendation] Use unnamed namespace.

{
#if WORKAROUND_ISSUE_1123
return miopen::IsEnabled(MIOPEN_DEBUG_DYNAMIC_REDUCTION{});
#else
return !miopen::IsDisabled(MIOPEN_DEBUG_DYNAMIC_REDUCTION{});
#endif
}

struct get_tunable_reduction_kernel_constants
{
int GredThreadBufferLength;
Expand Down Expand Up @@ -460,7 +470,7 @@ std::size_t ReduceTensorDescriptor::GetWorkspaceSize(const Handle& handle,
64 + sizeof(int);

// dynamic reduction use one additional page for storing tensor descriptors
if(!miopen::IsEnabled(MIOPEN_DISABLE_DYNAMIC_REDUCTION{}))
if(detail::IsDynamicReductionEnabled())
wsSizeInBytes += 4096;

return (wsSizeInBytes);
Expand Down Expand Up @@ -570,8 +580,7 @@ void ReduceTensorDescriptor::ReduceTensor(const Handle& handle,
const auto invariantLength = cDesc.GetElementSize();
const auto toReduceLength = aDesc.GetElementSize() / invariantLength;

const int blockSize =
miopen::IsEnabled(MIOPEN_DISABLE_DYNAMIC_REDUCTION{}) ? 256 : tunable->BlockSize;
const int blockSize = detail::IsDynamicReductionEnabled() ? tunable->BlockSize : 256;
detail::ReductionKernelConfigurator configurator(blockSize, handle.GetWavefrontWidth());

const ReductionMethod_t reduceImpl =
Expand Down Expand Up @@ -606,7 +615,7 @@ void ReduceTensorDescriptor::ReduceTensor(const Handle& handle,
? static_cast<float>(*reinterpret_cast<const double*>(beta))
: *reinterpret_cast<const float*>(beta);

if(miopen::IsEnabled(MIOPEN_DISABLE_DYNAMIC_REDUCTION{}))
if(!detail::IsDynamicReductionEnabled())
{ // use static reduction
std::vector<std::size_t> invariantLengths;
std::vector<std::size_t> invariantStrides;
Expand Down