Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HIP assert in SLIC macros #1498

Merged
merged 4 commits into from
Feb 6, 2025
Merged

Enable HIP assert in SLIC macros #1498

merged 4 commits into from
Feb 6, 2025

Conversation

bmhan12
Copy link
Contributor

@bmhan12 bmhan12 commented Jan 29, 2025

This PR

  • Enables assert for HIP in the SLIC_ASSERT_* and SLIC_CHECK_* macros.

EDIT:

  • Fixes the CMAKE_CXX_FLAGS_DEBUG to actually allow assert() for rocm builds.
  • Updated axom spack recipe and host-configs to reflect CMAKE_CXX_FLAGS_DEBUG change
    • Note: %cce debug builds require some optimization (-O1) to compile without cray compiler errors (error is of the form: PLEASE submit a bug report to Cray and include the crash backtrace, preprocessed source, and associated run script. Stack dump: ... ).
  • Remove checks for NDEBUG in mesh_tester

EDIT 2:

  • Small change to sbatch option for Gitlab CI ruby jobs: res=ci --> reservation=ci. res is no longer recognized as an option:
sbatch: option '--res=ci' is ambiguous; possibilities: '--reservation' '--resv-ports'

@bmhan12 bmhan12 added Slic Issues related to Axom's 'slic' component Hip Issues related to Hip labels Jan 29, 2025
@bmhan12 bmhan12 force-pushed the feature/han12/hip_assert branch from d65797a to 453111d Compare January 29, 2025 17:40
Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bmhan12

Please update the RELEASE-NOTES to indicate that SLIC_ASSERTs now work w/ hip as well as restrictions (if any)

Comment on lines 494 to 502
// Use assert when on device
#elif defined(AXOM_DEBUG) && defined(AXOM_DEVICE_CODE)
#if !defined(__HIP_DEVICE_COMPILE__)
#define SLIC_ASSERT(EXP) assert(EXP)
#define SLIC_ASSERT_MSG(EXP, msg) assert(EXP)
#define SLIC_CHECK(EXP) assert(EXP)
#define SLIC_CHECK_MSG(EXP, msg) assert(EXP)
#else
// AXOM_DEBUG is defined so the expression and message cannot be ignored
// as they probably contain parameters that were decorated with AXOM_UNUSED_PARAM
// which will require the expression and message to be used to avoid warnings.
// Here we use them in a way that will not really have a runtime effect.
// The msg code block may contain << operators so we need a stream-like object
// but we enclose it in "if(false)" so it never executes.
namespace axom
{
namespace slic
{
namespace internal
{
class blackhole
{ };
// Write an object to a blackhole.
template <typename T>
AXOM_HOST_DEVICE inline blackhole &operator<<(blackhole &bh, T)
{
return bh;
}
} // namespace internal
} // namespace slic
} // namespace axom

#define SLIC_ASSERT(EXP) ((void)(EXP))
#define SLIC_ASSERT_MSG(EXP, msg) \
{ \
if(false) \
{ \
((void)(EXP)); \
axom::slic::internal::blackhole __oss; \
__oss << msg; \
} \
}
#define SLIC_CHECK(EXP) ((void)(EXP))
#define SLIC_CHECK_MSG(EXP, msg) \
{ \
if(false) \
{ \
((void)(EXP)); \
axom::slic::internal::blackhole __oss; \
__oss << msg; \
} \
}
#endif
#define SLIC_ASSERT(EXP) assert(EXP)
#define SLIC_ASSERT_MSG(EXP, msg) assert(EXP)
#define SLIC_CHECK(EXP) assert(EXP)
#define SLIC_CHECK_MSG(EXP, msg) assert(EXP)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a version of hip/rocm where this starts working and an earlier version where it doesn't work?

}
#endif
#define SLIC_ASSERT(EXP) assert(EXP)
#define SLIC_ASSERT_MSG(EXP, msg) assert(EXP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there restrictions on msg?
E.g. can only be a simple c-style string and not an iostream expression or fmt formatted string?
Please not these restrictions, if any, here

@bmhan12 bmhan12 force-pushed the feature/han12/hip_assert branch from 6e6a3b2 to 35b5e4b Compare February 4, 2025 17:24
…recipe accordingly, and undo NDEBUG requirement for mesh_tester hip policy
@bmhan12 bmhan12 force-pushed the feature/han12/hip_assert branch from 716bde3 to f902598 Compare February 5, 2025 19:40
@bmhan12 bmhan12 added Build system Issues related to Axom's build system CI Issues related to continuous integration labels Feb 5, 2025
@bmhan12
Copy link
Contributor Author

bmhan12 commented Feb 5, 2025

Re-requesting reviews since the PR has expanded to include host-config changes and a Gitlab CI change to fix ruby jobs after the LC update.
I've updated the PR description to reflect the new changes.

@@ -181,7 +181,7 @@ void Input::parse(int argc, char** argv, axom::CLI::App& app)
#ifdef AXOM_USE_CUDA
pol_sstr << "\nSet to 'raja_cuda' or 3 to use the RAJA CUDA policy.";
#endif
#if defined(AXOM_USE_HIP) && defined(NDEBUG)
#if defined(AXOM_USE_HIP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to your changes, but is there a better way to do this than to have integral values, which IMO are opaque. Maybe, leave those out of the string and just say Set to 'raja_omp' etc.?

Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bmhan12

Copy link
Contributor

@gunney1 gunney1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bmhan12

@@ -7,7 +7,7 @@
# This is the shared configuration of jobs for ruby
.on_ruby:
variables:
SCHEDULER_PARAMETERS: "--res=ci --exclusive=user --deadline=now+1hour -N1 -t ${ALLOC_TIME}"
SCHEDULER_PARAMETERS: "--reservation=ci --exclusive=user --deadline=now+1hour -N1 -t ${ALLOC_TIME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is blocking other PRs. Can it be pulled into its own PR? Or can this be PR be merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge it.

@bmhan12 bmhan12 merged commit 95c0a46 into develop Feb 6, 2025
13 checks passed
@bmhan12 bmhan12 deleted the feature/han12/hip_assert branch February 6, 2025 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build system Issues related to Axom's build system CI Issues related to continuous integration Hip Issues related to Hip Slic Issues related to Axom's 'slic' component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants