Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi-threaded DOTC, GEMV, GEMM along with BLAS Support #155

Merged
merged 46 commits into from
Nov 15, 2021

Conversation

maliasadi
Copy link
Member

@maliasadi maliasadi commented Oct 13, 2021

Context:
Add Cache-Friendly and Multi-threaded DOTC, GEMV, GEMM along with BLAS Support to PL-lightning

Description of the Change:
Update innerProd and innerProdC.
Add Transpose, matrixVecProd, and matrixMatProd.

Benefits:
Utilizing BLAS operations: inner, matrix-vector, and matrix-matrix products are orders of magnitude faster than plain implementation of such arithmetic. This PR adds BLAS support as well as a cache-friendly and multi-threaded implementation of these operations in PL-lightning.

Possible Drawbacks:

Related GitHub Issues:

@codecov
Copy link

codecov bot commented Oct 13, 2021

Codecov Report

Merging #155 (1de986e) into master (3cc54a3) will not change coverage.
The diff coverage is n/a.

❗ Current head 1de986e differs from pull request most recent head 8b9bc71. Consider uploading reports for the commit 8b9bc71 to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##            master      #155   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          182       182           
=========================================
  Hits           182       182           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cc54a3...8b9bc71. Read the comment docs.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 13, 2021

Test Report (C++) on Ubuntu

       1 files  ±    0         1 suites  ±0   0s ⏱️ ±0s
   345 tests +  18     345 ✔️ +  18  0 💤 ±0  0 ±0 
1 919 runs  +734  1 919 ✔️ +734  0 💤 ±0  0 ±0 

Results for commit 8b9bc71. ± Comparison against base commit 3cc54a3.

♻️ This comment has been updated with latest results.

@maliasadi maliasadi changed the title Add Cache-Friendly DOTC, GEMV, GEMM along with BLAS Support Add Multi-threaded DOTC, GEMV, GEMM along with BLAS Support Oct 13, 2021
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ali, just left a few comments here.

cblas_zdotu_sub(data_size, data_1, 1, data_2, 1, &result);
}
} else {
#if defined(USE_CBLAS) && USE_CBLAS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does if constexpr not work here? I'd prefer it if we could use it instead of the preprocessor where possible. Any non-enabled #ifdef code will never hit the compiler after the preprocessing step, whereas anything with if constexpr will, and hence can be checked for correctness, even if not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if constexpr works only for innerProd and innerProdC, but not for matrixVecProd and matrixMatProd. So, to be consistent, I switched using #ifdef for all places using BLAS operations. Let me know if you still think that using if constexpr is the better idea for innerProd and innerProdC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should figure out how to keep if constexpr in here as it will be much safer and maintainable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, using define directive to define undefined BLAS variables in matrixVecProd and matrixMatProd is a safe and practical approach to tackle this problem. What do you think?

Copy link
Member

@mlxd mlxd Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the top-level variables, maybe so, though I still think enabling this with if constexpr is the way to go wherever we can. With the preprocessor route half the code gets disabled before hitting the compiler; going with if constexpr allows for better static code analysis, and is a feature of the modern uses of the C++. Avoiding the preprocessor unless required has become a common theme in many C++ articles (https://cor3ntin.github.io/posts/undef_preprocessor/ is a good example). If using if constexpr is not possible, maybe in this instance many of the helpers in the type_traits library could help out here (https://en.cppreference.com/w/cpp/header/type_traits), such as enable_if for example.

As for keeping defines where needed at the top of the file, that can be fine, but the approach as taken in https://blog.tartanllama.xyz/if-constexpr/ with enums is probably preferable where possible. Thanks for tolerating me on this 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying your concerns. This is also a great read on if constexpr. I get it now what to do to "hopefully" address all the concerns within #ifdef blocks and define macros 😅

pennylane_lightning/src/util/Util.hpp Outdated Show resolved Hide resolved
pennylane_lightning/src/util/Util.hpp Outdated Show resolved Hide resolved
Comment on lines 40 to 42
#define CblasTrans 0
#define CblasNoTrans 0
#define CblasRowMajor 0
Copy link
Member

@mlxd mlxd Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these flags are used to select logical options, defining them with an enum may be a safer move. This can be placed in an anonymous namespace to enable visibility within the file only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This article by Sy Brand who works on Microsoft's MSVC compiler has some good examples of doing conditional compilation for different operating systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will be fine to mirror the enum from here, with the same values if not already defined.

Comment on lines 44 to 45
#define DOTU_STD_CROSSOVER (1 << 10)
#define DOTC_STD_CROSSOVER (1 << 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may also be better used as template non-type arguments in the functions they are used in. This way we can conditional compile it using templating, and not be required to specify it as a defined magic value.

pennylane_lightning/src/util/Util.hpp Outdated Show resolved Hide resolved
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ali, thanks for the changes here. Though, I noticed some things (listed below). A note too when building for OpenBLAS support: the resulting .so fails to link against the OpenBLAS library (not sure of the fix here).

Also, I am seeing this implementation slower for real use-cases (running Trevor's benchmark scripts for the Jacobain), in comparison to the non-BLAS implementations. I wonder if there is something we need to consider with tuning the methods to improve performance here.

constexpr bool USE_CBLAS = false;
#endif

#ifndef CBLAS_TRANSPOSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails to catch the definition of CBLAS_TRANSPOSE when building against OpenBLAS on my machine (g++ v11.1), due to multiple definitions.

};
#endif

#ifndef CBLAS_LAYOUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here. I wonder if there is a better way of scoping these so they do not have problems with the already defined CBLAS versions.

@maliasadi maliasadi merged commit 63929ed into master Nov 15, 2021
@maliasadi maliasadi deleted the faster_linalgeb_support branch November 15, 2021 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants