-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multi-threaded DOTC, GEMV, GEMM along with BLAS Support #155
Conversation
Codecov Report
@@ Coverage Diff @@
## master #155 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 182 182
=========================================
Hits 182 182 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ali, just left a few comments here.
…eAI/pennylane-lightning into faster_linalgeb_support
cblas_zdotu_sub(data_size, data_1, 1, data_2, 1, &result); | ||
} | ||
} else { | ||
#if defined(USE_CBLAS) && USE_CBLAS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does if constexpr
not work here? I'd prefer it if we could use it instead of the preprocessor where possible. Any non-enabled #ifdef
code will never hit the compiler after the preprocessing step, whereas anything with if constexpr
will, and hence can be checked for correctness, even if not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constexpr
works only for innerProd and innerProdC, but not for matrixVecProd and matrixMatProd. So, to be consistent, I switched using #ifdef
for all places using BLAS operations. Let me know if you still think that using if constexpr
is the better idea for innerProd and innerProdC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should figure out how to keep if constexpr
in here as it will be much safer and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, using define directive to define undefined BLAS variables in matrixVecProd and matrixMatProd is a safe and practical approach to tackle this problem. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the top-level variables, maybe so, though I still think enabling this with if constexpr
is the way to go wherever we can. With the preprocessor route half the code gets disabled before hitting the compiler; going with if constexpr
allows for better static code analysis, and is a feature of the modern uses of the C++. Avoiding the preprocessor unless required has become a common theme in many C++ articles (https://cor3ntin.github.io/posts/undef_preprocessor/ is a good example). If using if constexpr
is not possible, maybe in this instance many of the helpers in the type_traits
library could help out here (https://en.cppreference.com/w/cpp/header/type_traits), such as enable_if
for example.
As for keeping defines where needed at the top of the file, that can be fine, but the approach as taken in https://blog.tartanllama.xyz/if-constexpr/ with enums is probably preferable where possible. Thanks for tolerating me on this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying your concerns. This is also a great read on if constexpr
. I get it now what to do to "hopefully" address all the concerns within #ifdef
blocks and define macros 😅
#define CblasTrans 0 | ||
#define CblasNoTrans 0 | ||
#define CblasRowMajor 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these flags are used to select logical options, defining them with an enum
may be a safer move. This can be placed in an anonymous namespace to enable visibility within the file only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This article by Sy Brand who works on Microsoft's MSVC compiler has some good examples of doing conditional compilation for different operating systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it will be fine to mirror the enum from here, with the same values if not already defined.
#define DOTU_STD_CROSSOVER (1 << 10) | ||
#define DOTC_STD_CROSSOVER (1 << 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These may also be better used as template non-type arguments in the functions they are used in. This way we can conditional compile it using templating, and not be required to specify it as a defined magic value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ali, thanks for the changes here. Though, I noticed some things (listed below). A note too when building for OpenBLAS support: the resulting .so fails to link against the OpenBLAS library (not sure of the fix here).
Also, I am seeing this implementation slower for real use-cases (running Trevor's benchmark scripts for the Jacobain), in comparison to the non-BLAS implementations. I wonder if there is something we need to consider with tuning the methods to improve performance here.
constexpr bool USE_CBLAS = false; | ||
#endif | ||
|
||
#ifndef CBLAS_TRANSPOSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails to catch the definition of CBLAS_TRANSPOSE
when building against OpenBLAS on my machine (g++ v11.1), due to multiple definitions.
}; | ||
#endif | ||
|
||
#ifndef CBLAS_LAYOUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here. I wonder if there is a better way of scoping these so they do not have problems with the already defined CBLAS versions.
Context:
Add Cache-Friendly and Multi-threaded DOTC, GEMV, GEMM along with BLAS Support to PL-lightning
Description of the Change:
Update
innerProd
andinnerProdC
.Add
Transpose
,matrixVecProd
, andmatrixMatProd
.Benefits:
Utilizing BLAS operations: inner, matrix-vector, and matrix-matrix products are orders of magnitude faster than plain implementation of such arithmetic. This PR adds BLAS support as well as a cache-friendly and multi-threaded implementation of these operations in PL-lightning.
Possible Drawbacks:
Related GitHub Issues: