Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX2/512 kernels #313

Merged
merged 71 commits into from
Aug 5, 2022
Merged

Add AVX2/512 kernels #313

merged 71 commits into from
Aug 5, 2022

Conversation

chaeyeunpark
Copy link
Contributor

@chaeyeunpark chaeyeunpark commented Jul 5, 2022

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:

Description of the Change:

Benefits:

Possible Drawbacks:

Related GitHub Issues:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2022

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2022

Test Report (Linux, Kokkos, and Kokkos Kernels) on Ubuntu

           1 files  ±       0             1 suites  ±0   2s ⏱️ ±0s
       956 tests  -      19         956 ✔️  -      19  0 💤 ±0  0 ±0 
234 969 runs  +5 689  234 969 ✔️ +5 689  0 💤 ±0  0 ±0 

Results for commit 740f4b2. ± Comparison against base commit e0768d6.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Jul 15, 2022

Codecov Report

Merging #313 (740f4b2) into master (e0768d6) will decrease coverage by 0.05%.
The diff coverage is 99.48%.

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
- Coverage   99.79%   99.74%   -0.06%     
==========================================
  Files          40       73      +33     
  Lines        3929     5108    +1179     
==========================================
+ Hits         3921     5095    +1174     
- Misses          8       13       +5     
Impacted Files Coverage Δ
pennylane_lightning/src/gates/GateUtil.cpp 100.00% <ø> (ø)
pennylane_lightning/src/gates/GateUtil.hpp 100.00% <ø> (ø)
pennylane_lightning/src/util/IntegerInterval.hpp 100.00% <ø> (ø)
pennylane_lightning/src/util/Memory.hpp 100.00% <ø> (ø)
...nnylane_lightning/src/simulator/CPUMemoryModel.hpp 90.90% <87.50%> (-5.25%) ⬇️
...ennylane_lightning/src/simulator/KernelMap_X64.cpp 88.88% <88.88%> (ø)
...tning/src/gates/cpu_kernels/avx_common/AVXUtil.hpp 94.59% <94.59%> (ø)
pennylane_lightning/_version.py 100.00% <100.00%> (ø)
pennylane_lightning/lightning_qubit.py 99.62% <100.00%> (+<0.01%) ⬆️
...ylane_lightning/src/gates/RegisterKernels_AVX2.cpp 100.00% <100.00%> (ø)
... and 37 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Jul 16, 2022

It seems like I need to refactor a code a little more. Still, I summarize here the benchmark results so far.

This is a result for the RX gate:
RX
RX

And the CZ gate:
CZ
CZ

We see a remarkable speedup for 10-18 qubits. For the RX gate with a single-precision, it is even up to 10 times faster. I suspect that AVX2/512 is less effective for a large number of qubits as the gate operation within packed data is faster than the operation between packed data. A similar result is observed for the CZ gate beside the fact that the speed-up is less compelling.

@chaeyeunpark chaeyeunpark changed the title [WIP] Add AVX kernels [WIP] Add AVX2/512 kernels Jul 16, 2022
@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Jul 18, 2022

One subtle point is that enabling -fma speeds up some gate operations of the AVX2 kernel by around 10%. Given that most CPUs supporting AVX2 also support FMA, we may compile AVX2 kernels always with FMA and enable them (in runtime) when both instruction sets are supported. Any idea?

@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Aug 3, 2022

@mlxd I am still not so sure whether we want to exclude the AVX512 kernel from codecov.

Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job putting this together @chaeyeunpark
It really helps out. I have some small suggestions, but nothing major.

Happy to give it another pass after this and call it done.

doc/avx_kernels/build_system.rst Outdated Show resolved Hide resolved
doc/avx_kernels/build_system.rst Outdated Show resolved Hide resolved
doc/avx_kernels/build_system.rst Outdated Show resolved Hide resolved
doc/avx_kernels/build_system.rst Outdated Show resolved Hide resolved
doc/avx_kernels/implementation.rst Outdated Show resolved Hide resolved
doc/avx_kernels/implementation.rst Outdated Show resolved Hide resolved
doc/avx_kernels/implementation.rst Show resolved Hide resolved
doc/avx_kernels/implementation.rst Outdated Show resolved Hide resolved
doc/avx_kernels/implementation.rst Outdated Show resolved Hide resolved
doc/avx_kernels/implementation.rst Outdated Show resolved Hide resolved
@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Aug 3, 2022

Thanks all guys for the sincere reviews! CodeCov is not fully satisfying (it is still >99%), but I want to improve it in the following PRs. It is okay with you?

@mlxd
Copy link
Member

mlxd commented Aug 4, 2022

@mlxd I am still not so sure whether we want to exclude the AVX512 kernel from codecov.

Yea, I think this may be sensible since we cannot guarantee what runner we will hit. Fine for me to exclude it completely for now.

Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome effort here @chaeyeunpark

Copy link
Member

@multiphaseCFD multiphaseCFD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Chae-Yeun! Great job!

@chaeyeunpark chaeyeunpark merged commit 921cb88 into master Aug 5, 2022
@chaeyeunpark chaeyeunpark deleted the add_avx_kernels branch August 5, 2022 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants