Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sycl Half #1710

Open
wants to merge 19 commits into
base: half_batch
Choose a base branch
from
Open

Sycl Half #1710

wants to merge 19 commits into from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Oct 25, 2024

This PR will be necessary to support half in sycl when we still use gko::half in the host.
It creates the mapping like cuda_type or hip_type in sycl part, and apply it to those variables with value type

It will be merged/checked when we have half and intel ci

@yhmtsai yhmtsai added this to the Ginkgo 1.9.0 milestone Oct 25, 2024
@yhmtsai yhmtsai self-assigned this Oct 25, 2024
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. type:factorization This is related to the Factorizations type:stopping-criteria This is related to the stopping criteria mod:dpcpp This is related to the DPC++ module. labels Oct 25, 2024
@yhmtsai yhmtsai requested a review from a team October 25, 2024 12:51
@MarcelKoch MarcelKoch self-requested a review November 11, 2024 11:06
CMakeLists.txt Show resolved Hide resolved
Comment on lines 142 to 144
// TODO: check whether mac compiler always use complex version even when real
// half
#define COMPLEX_HALF_OPERATOR(_op, _opeq) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this TODO been addressed?

Comment on lines 275 to 274
// unsupported
auto old = *addr;
*addr += val;
return old;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's unsupported, we should not compile it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a workaround, we could use an atomic_cas loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this workaround, would creating a non-std complex implementation with sycl::half work, e.g. gko::complex<sycl::half>? Then you could do the usual device type mapping stuff to actually use this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this idea, though we should keep the implementation internal to Ginkgo to allow future changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course, it should be a private type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the complex is internal type. They will not expose it to public.
We only add the dependence with -fsycl, which is only in the dpcpp backend.
I wonder it affect the performance especially the intel complex mapped std::complexsycl::half to __spv::complex_half, which might be their internal type?
In gko::half and sycl::half, I have seen 1.5 or 2x slowdown by using gko::half.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can you also use _spv::complex_half here? It sounds a bit like you could just map std::complex<gko::half> to that type and don't need to introduce a new type at all.

Comment on lines 186 to 217
// after providing std::complex<sycl::half>, we can load their <complex> to
// complete the header chain.

#if GINKGO_DPCPP_MAJOR_VERSION > 7 || \
(GINKGO_DPCPP_MAJOR_VERSION == 7 && GINKGO_DPCPP_MINOR_VERSION >= 1)

#if defined(__has_include_next)
// GCC/clang support go through this path.
#include_next <complex>
#else
// MSVC doesn't support "#include_next", so we take the same workaround in
// stl_wrappers/complex.
#include <../stl_wrappers/complex>
#endif

#else


#include <complex>


#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just remove this and always include <complex>.
Some of the comments can also be adjusted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be a normal header again after everything works well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:cuda This is related to the CUDA module. mod:dpcpp This is related to the DPC++ module. mod:hip This is related to the HIP module. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:preconditioner This is related to the preconditioners type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants