Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename oneMKL Interface to oneMath #602

Merged
merged 46 commits into from
Dec 3, 2024

Conversation

Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Oct 21, 2024

Description

Related RFC: #564 and related spec PR: uxlfoundation/oneAPI-spec#596
As a reminder the plan is to have this PR and the spec PR approved before we move this repository to https://github.com/uxlfoundation and merge the 2 PRs.

  • The PR has the following implications on the MKLCPU and MKLGPU backends:
    • Had to introduce src/include/common_onemkl_conversion.hpp to convert oneMath common types to oneMKL types. This wasn't needed before as the backends relied on the 2 projects to use the same namespace. The file also provide helper macros to catch oneMKL exceptions and rethrow them as oneMath exceptions.
    • The DFT MKLGPU backend used to catch and rethrow some but not all oneapi::mkl exceptions when committing a descriptor. We discussed this issue by email recently. With the changes needed by the renaming, we now catch and rethrow all the exceptions as oneapi::math exceptions and I was able to remove a related workaround in the tests (FYI @oneapi-src/onemkl-dft-write). Also see related discussion below: Rename oneMKL Interface to oneMath #602 (comment)
    • The RNG MKLGPU backend relied heavily on the 2 libraries using the same namespace. I introduced onemkl_distribution_conversion.hpp to convert the RNG types and call the backend's free functions such as oneapi::mkl::rng::generate, oneapi::mkl::rng::skip_ahead. @oneapi-src/onemkl-rng-write you may want to carefully review src/rng/backends/mklgpu/ as I am not familiar with the RNG domain.
    • The renaming made it clear that some header files were duplicating function declarations for no reason in the BLAS and LAPACK domains. They are removed in baeb476.
  • The PR also remove the domain documentation for BLAS and LAPACK which duplicates the specification (commit b382291). We discussed in the issue [BLAS] Wrong namespace usage in BLAS Documentation #489 that the duplicated documentation was an issue and some of us are keen to remove it. Now is a good time as it becomes even more outdated with the name change. (@oneapi-src/onemkl-blas-write and @oneapi-src/onemkl-lapack-write FYI)
  • The changes assume that the teams @oneapi-src/onemkl-* will be renamed (or duplicated) to @uxlfoundation/onemath-* in the new repository (@rscohn2 FYI)
  • The changes assume that the onemkl Slack channel will be renamed to onemath (@rodburns FYI)
  • Removed USE_MKLREF which was only used in Lapack tests and was not documented. This macro wouldn't work easily with the namespace change anymore (@oneapi-src/onemkl-lapack-write FYI)
  • For reference the remaining occurrences of MKL are:
    • Intel backend names MKLCPU and MKLGPU.
    • Usages of oneapi::mkl inside the Intel backends.
    • File cmake/mkl/MKLConfig.cmake (could probably be removed IMO).
    • [email protected] in CODE_OF_CONDUCT.md and CONTRIBUTING.md. Let me know if there is any plan to create [email protected]. If not this can be updated separately.
    • Folder include/oneapi/mkl for deprecated headers and deprecated oneapi::mkl C++ namespace.
    • Deprecated ONEMKL CMake namespace.
    • Some files in the MKLCPU and MKLGPU backends use "mkl" in their name. They could be updated to onemkl if needed.

Checklist

All Submissions

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 7, 2024

I fixed some more recent conflicts.
@oneapi-src/onemkl-blas-write, @oneapi-src/onemkl-lapack-write in the last commit (baeb476) I have removed a couple of header files in blas and lapack which were duplicating function declarations for the MKLCPU and MKLGPU backends. Given that we can simply include the Intel oneMKL headers instead they can be removed here. This was discussed in #606 (comment)
Log testing the MKLCPU and MKLGPU backends with the 2025.0 base toolkit: intel_mklcpu_mklgpu.txt

Copy link
Contributor

@andrewtbarker andrewtbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLAS changes look good to me.

docs/building_the_project_with_dpcpp.rst Outdated Show resolved Hide resolved
@sknepper
Copy link
Contributor

sknepper commented Nov 8, 2024

/intelci: run

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of DFT

@toxicscum
Copy link

/intelci: run

@sknepper
Copy link
Contributor

Thanks for all your work here - there are so many changes, so I know you've spent a lot of effort on this!
It looks like there are some problems building on Windows. For example, I'm seeing an error for the BLAS domain like:

[1/155] C:\cache\stash\oneapi-compiler\20240719_rls\win\package\compiler\latest\bin\icx.exe -fsycl /nologo -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\include -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\build\bin -IC:\cache\stash\onemkl\regular\2024.2.1\20240722\build__release_win\include /EHsc /DWIN32 /D_WINDOWS /W3 /GR /EHsc -Wno-unused-function -w /O2 /Ob2 /DNDEBUG -iquote C:/temp/ec/mkltest/auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d/sources/include -DMKL_ILP64 -fsycl -QMD -QMT bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -QMF bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj.d /Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -c C:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\blas\backends\mklcpu\mklcpu_level2.cpp
FAILED: bin/blas/backends/mklcpu/CMakeFiles/onemath_blas_mklcpu_obj.dir/mklcpu_level2.cpp.obj
C:\cache\stash\oneapi-compiler\20240719_rls\win\package\compiler\latest\bin\icx.exe -fsycl /nologo -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\include -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\build\bin -IC:\cache\stash\onemkl\regular\2024.2.1\20240722\build__release_win\include /EHsc /DWIN32 /D_WINDOWS /W3 /GR /EHsc -Wno-unused-function -w /O2 /Ob2 /DNDEBUG -iquote C:/temp/ec/mkltest/auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d/sources/include -DMKL_ILP64 -fsycl -QMD -QMT bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -QMF bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj.d /Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -c C:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\blas\backends\mklcpu\mklcpu_level2.cpp
icx: error: cannot specify '-Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj' when compiling multiple source files

And somewhat similar for LAPACK (and RNG); e.g.,
icx: error: cannot specify '-Fobin\lapack\backends\mklcpu\CMakeFiles\onemath_lapack_mklcpu_obj.dir\mkl_lapack.cpp.obj' when compiling multiple source files

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 13, 2024

Thanks @sknepper I have been able to identify that the issue is due to using -iquote flag here which is not supported on Windows.
This is needed because we now need to include the Intel oneMKL mkl.hpp file to have a declaration of the oneapi::mkl symbols. Including this file conflicts with the mkl.hpp in this project which does not provide the right symbols anymore.
I am looking for another solution other than using -iquote but I am not confident I can find a nice solution. This is a difficult problem to solve and the main reason why I think different projects should not provide the same header files.

In terms of review it is still possible to review most of the changes. I expect the changes needed will only affect CMake files.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 14, 2024

@toxicscum @sknepper I have found a decent solution which should fix the compilation issue on Windows in a2de0e5.
I have described the issue and the solution I used in this comment: a2de0e5#diff-148715d6ea0c0ea0a346af3f6bd610d010d490eca35ac6a9b408748f7ca9e3f4R51

@sknepper
Copy link
Contributor

/intelci: run

Copy link
Contributor

@sknepper sknepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Rbiessy - what an impressive set of changes!
LAPACK looks good - thanks for removing the duplicated headers and documentation - that will streamline things in the future.
The USE_MKLREF in the tests was just a convenience feature to avoid a dependency on Netlib; in the future, if we create standalone testing, the same goal will have been achieved. So that's fine to remove this undocumented macro.

The latest round of precommit testing is still running, but it does indeed look like the Windows build issues were resolved - much thanks!

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, thank you! I have two minor comments.

README.md Outdated Show resolved Hide resolved

You can also join the mailing lists for the [UXL Foundation](https://lists.uxlfoundation.org/g/main/subgroups) to be informed of when meetings are happening and receive the latest information and discussions.

---

## Contributing

You can contribute to this project and also contribute to [the specification for this project](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/). Please read the [CONTRIBUTING](CONTRIBUTING.md) page for more information. You can also contact oneMKL developers and maintainers via [UXL Foundation Slack](https://slack-invite.uxlfoundation.org/) using [#onemkl](https://uxlfoundation.slack.com/archives/onemkl) channel.
You can contribute to this project and also contribute to [the specification for this project](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemath/source/). Please read the [CONTRIBUTING](CONTRIBUTING.md) page for more information. You can also contact oneMath developers and maintainers via [UXL Foundation Slack](https://slack-invite.uxlfoundation.org/) using [#onemath](https://uxlfoundation.slack.com/archives/onemath) channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know who can update the slack channel name? Does it make sense to introduce new channel and close the old one once these changes will be merged?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know who can update the slack channel name? Does it make sense to introduce new channel and close the old one once these changes will be merged?

I can rename the channel or create a new one. It might make sense to retain the existing channel though I am open to whatever the consensus is. Just drop me a message when you are ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodburns! Renaming'd be perfect option if possible.

Co-authored-by: Maria Kraynyuk <[email protected]>
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 26, 2024

@spencerpatty gentle ping since you have approved the PR for the specification renaming but not the implementation.
Also pinging @oneapi-src/onemkl-rng-write if you are able to review the 2 renaming PRs before November 29 that'd be great.
Thanks

Copy link

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all Sparse BLAS related files and I approve of these changes. I love that it is much simpler to distinguish whether we are using oneMATH variables or the backend variables in the files now. This is a good thing!

@spencerpatty
Copy link

@oneapi-src/onemkl-sparse-write I have updated the PR to fix the conflicts with the cuSPARSE PR. It would be useful if you can review the sparse related changes in this PR before the rocSPARSE PR #544! Log for Intel backends: intel_mklcpu_mklgpu.txt Log for Nvidia backends: log_a100.txt

@Rbiessy I have reviewed the cuSPARSE changes and the general changes for Sparse BLAS and approved them, but in the intel mklcpu/gpu logs it seems like all the SPARSE BLAS tests are skipping. Is that intended? cuSPARSE tests appear to run the first 4 or 5 then skip the rest as well ... is that intended ?

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 27, 2024

@Rbiessy I have reviewed the cuSPARSE changes and the general changes for Sparse BLAS and approved them, but in the intel mklcpu/gpu logs it seems like all the SPARSE BLAS tests are skipping. Is that intended? cuSPARSE tests appear to run the first 4 or 5 then skip the rest as well ... is that intended ?

@spencerpatty yes, this is expected. Unfortunately the tests are not "unit test" so each GTest is actually running a dozen or more smaller tests. See for instance the spmm tests, each call to test_functor_i32 is testing a different configuration of spmm. If any of these configurations is not supported the GTest will be marked as skipped. I think it would be better if each GTest tests one configuration but:

  1. It would differ from most other domains.
  2. We would need some more work to re-use the SYCL queue across GTest to avoid the expensive creation of the queue. This is possible but it requires to use test fixtures with TEST_F which I believes makes it impossible to easily generate the tests for different data types.

We experimented with the smaller unit tests approach in the DFT domain but didn't find a nice solution to re-use the SYCL queue. This becomes an issue the more tests we have.

We made sure that the configurations that are skipped are expected to be skipped when reviewing the PRs for the Intel backends.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Dec 3, 2024

I fixed some minor conflicts with the rng domain, see the rng tests log attached: log_rng.txt

Thanks all for the feedback, we're proceeding with the migration today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants