-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow test/3
drivers to use default ind_t
method.
#804
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Details: - Previously, the standalone performance drivers in test/3 were written under the assumption that the user would want to explicitly test either native execution *or* 1m. But because the accompanying runme.sh script defaults to passing "native" in for the -i command line option (which explicitly sets the induced method type), using it without modification causes the test drivers to use reference microkernels on systems where native complex-domain microkernels are not registered. Furthermore, even if a user was aware of this, the test drivers did not support any single value for the -i option that would test BLIS using the library's default behavior -- that is, using 1m on systems where it is needed and native execution on systems that have native microkernels. - This commit addresses the aforementioned issue by supporting a new value for the -i option: "auto". The "auto" value causes the driver to avoid explicitly setting the induced method altogether, leaving BLIS's default behavior in place. This "auto" option is now the default setting within the runme.sh script. Thanks to Leick Robinson for finding and reporting this issue. - Also added support for "nat" as a shorthand for "native", which the help text already (erroneously) claimed was supported.
fgvanzee
added a commit
that referenced
this pull request
Apr 25, 2024
Details: - This is an "omnibus" commit, consisting of multiple medium-sized commits that affect non-trivial aspects of BLIS. The major highlights: - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the cntl_t) to the thrinfo_t object. This allows the rntm_t to be effectively const (although it is sometimes copied internally and modified to reflect different ways of parallelism). Moving the mem_t sets the stage for sharing a global control tree amongst all threads. - De-templatized the macrokernels for gemmt, trmm, and trsm to match the macrokernel for gemm, which has been de-templatized since 54fa28b. - Reimplemented bli_l3_determine_kc() by separating out the logic for adjusting KC based on MR/NR for triangular A and/or B into a new function, bli_l3_adjust_kc(). For now, this function is still called from bli_l3_determine_kc(), but in the future we plan to have it called once when constructing the control tree. - Refactored the level-3 thread decorator into two parts: - One part deals only with launching threads, each one calling a generic thread entry function. This code resides in frame/thread and constitutes the definition of bli_thread_launch(). Note that it is specific to the threading implementation (OpenMP, pthreads, single, etc.) - The other part deals with passing the matrix operands and related information into bli_thread_launch(). This is the "l3 decorator" and now resides in frame/3. It is agnostic to the threading implementation. - Modified the "level" of the thread control tree passed in at each operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was passed in a communicator representing the active thread teams which would share the available work. Now, the *parent* thread comm is passed in. The operation then grabs the child comm and uses it to partition the work. The difference is in bli_trsm_blk_var1(), where there are now two children nodes for this single operation (i.e. the thread control tree is split one level above where the control tree is). The sub-prenode is used for the trsm subproblem while the normal sub-node is used for the gemm part. Importantly, the parent comm is used for the barrier between them. - Removed cntl_t* arguments from bli_*_front() functions. These will be added back in the future when the control tree's creation is moved so that it happens much sooner (provided that bli_*_front() have not been absorbed into their respective bli_*_ex() functions). - Renamed various bli_thread_*() query functions to bli_thrinfo_*(), for consistency. This includes _num_threads(), _thread_id(), _n_way(), _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and _am_chief(). - Removed extraneous barrier from _blk_var3() of gemm and trsm. - Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was misspelled. - (cherry picked from commit aeb5f0c) Fixed performance bug caused by redundant packing. (#680) Details: - Fixed a performance bug whereby multiple threads were redundantly packing the same (rather than separate) micropanels. This bug was caused by different parts of the code using the num_threads/thread_id field of the thrinfo_t vs. the n_way/work_id fields. The fix was to standardize on the latter and provide a "fake" thrinfo_t sub-prenode in the thrinfo tree which consists of single-member thread teams. The single team with multiple threads node is still required since it and only it can be used to perform barriers and broadcasts (e.g. of the packed buffer pointer). - (cherry picked from commit 29f79f0) Fixed random segfault in test/3 drivers. (#788) Details: - Fixed a segfault in the non-gemm test drivers in test/3 that was the result of sometimes leaving either .n_str or .k_str fields of the params_t struct uninitialized, depending on the operation in question. For example, in test_hemm.c, init_def_params() would only initialize the .m_str and .n_str fields, but not the .k_str field. Even though hemm doesn't use a 'k' dimension, the proc_params() function (called via parse_cl_params()) universally attempts to convert all three into integers via sscanf(), which was understandably failing when one of those strings was a NULL pointer. I'm not sure how this code ever worked to begin with. Special thanks to Leick Robinson for finding and reporting this bug. - (cherry picked from commit 1236dda) Fixed staleness in kernels/zen/3/bli_gemm_small.c. Details: - Added missing 'const' keyword in function prototypes for bli_gemm_small() and friends. - Updated pba usage to reflect new APIs. - Fixed syntax typo in 'export GOMP_CPU_AFFINITY' line in ul2128 conditional of test/3/runme.sh. - Thanks to Jeff Diamond for reporting these issues. Allow test/3 drivers to use default ind_t method. (#804) Details: - Previously, the standalone performance drivers in test/3 were written under the assumption that the user would want to explicitly test either native execution *or* 1m. But because the accompanying runme.sh script defaults to passing "native" in for the -i command line option (which explicitly sets the induced method type), running the script without modification causes the test drivers to use slow reference microkernels on systems where native complex-domain microkernels are not registered -- which will yield poor performance for complex-domain level-3 operations. Furthermore, even if a user was aware of this, the test drivers did not support any single value for the -i option that would test BLIS using the library's default behavior -- that is, using 1m on systems where it is needed and native execution on systems that have native microkernels implemented and registered. - This commit addresses the aforementioned issue by supporting a new value for the -i option: "auto". The "auto" value causes the driver to avoid explicitly setting the induced method altogether, leaving BLIS's default behavior in place. This "auto" option is also now the default setting within the runme.sh script. Thanks to Leick Robinson for finding and reporting this issue. - Also added support for "nat" as a shorthand for "native", which the help text already (erroneously) claimed was supported. - (cherry picked from commit fd1a7e3)
fgvanzee
added a commit
that referenced
this pull request
Apr 30, 2024
- (cherry picked from commit c803b03) Fix auto-detection of firestorm (Apple M1). - (cherry picked from commit 2dd692b) Added Discord documentation (#677) Details: - Added a docs/Discord.md markdown document that walks the reader through creating a Discord account, obtaining the invite link, and using the link to join the BLIS Discord server. - Updated README.md to reference the new Discord.md document in multiple places, including via the official Discord logo (used with explicit permission from representatives at Discord Inc.). - (cherry picked from commit 88105db) Shuffled checked properties in bli_l3_check.c. (#676) Details: - Added certain checks for matrix structure to the level-3 operations' _check() functions, and slightly reorganized existing checks. - (cherry picked from commit 23f5b8d) CREDITS file update. Details: - This attribution was intended to go in PR #647. - (cherry picked from commit 9453e0f) Reinstate sanity check in bli_pool_finalize. (#671) Details: - Added a reinit argument to bli_pool_finalize(). This bool will signal whether or not the function is being called from bli_pool_reinit(). If it is not being called from _reinit(), we can safely check to confirm that .top_index == 0 (i.e., all blocks have been checked in). But if it *is* being called from _reinit(), then that check will be skipped since one of the predicted use cases for bli_pool_reinit() anticipates that some blocks are (probably) checked out when the pool_t is reinitialized. - Updated existing invocations of bli_pool_finalize() to pass in either FALSE (from bli_apool_free_block() or bli_pba_finalize_pools()) or TRUE (from bli_pool_reinit()) for the new reinit argument. - (cherry picked from commit 76a23bd) Fix some bugs in bli_pool.c (#670) Details: - Add a check for premature pool exhaustion when checking in blocks via bli_pool_checkin_block(). This detects "double-free" and other bad conditions that don't necessarily result in a segfault. - Make sure to copy all block pointers when growing the pool size. Previously, checked-out block pointers (which are guaranteed to be set to NULL) were not being copied, leading to the presence of uninitialized data. - (cherry picked from commit 63470b4) Add AddressSanitizer (-fsanitize=address) option. (#669) Details: - Added support for AddressSanitizer (ASan), a compiler-integrated memory error detector. The option (disabled by default) enables compiling and linking with the -fsanitize=address flag supported by clang, gcc, and probably others. This flag is employed during compilation of all BLIS source files *except* for optimized kernels, which are exempted because ASan usually requires an extra register, which violates the constraints for many gemm microkernels. - Minor whitespace, comment, ordering, and configure help text updates. - (cherry picked from commit 42d0e66) Add consistent NaN/Inf handling in sumsqv. (#668) Details: - Changed sumsqv implementation as follows: - If there is a NaN (either real or imaginary), then return a sum of NaN and unit scale. - Else, if there is an Inf (either real or imaginary), then return a sum of +Inf and unit scale. - Otherwise behave as normal. - (cherry picked from commit b861c71) Parameterized test/3 drivers via command line args. (#667) Details: - Rewrote the drivers in test/3, the Makefile, and the runme.sh script so that most of the important parameters, including parameter combo, datatype, storage combo, induced method, problem size range, dimension bindings, number of repeats, and alpha/beta values can be passed in via command line arguments. (Previously, most of these parameters were hard-coded into the driver source, except a few that were hard-coded into the Makefile.) If no argument is given for any particular option, it will be assigned a sane default. Either way, the values employed at runtime will be printed to stdout before the performance data in a section that is commented out with '%' characters (which is used by matlab and octave for comments), unless the -q option is given, in which case the driver will proceed quietly and output only performance data. Each driver also provides extensive help via the -h option, with the help text tailored for the operation in question (e.g. gemm, hemm, herk, etc.). In this help text, the driver reminds the user which implementation it was linked to (e.g. blis, openblas, vendor, eigen). Thanks to Jeff Diamond for suggesting this CLI-based reimagining of the test/3 drivers. - In the test/3 drivers: converted cpp macro string constants, as well as two string literals (for the opname and pc_str) used in each test driver, to global (or static) const char* strings, and replaced the use of strncpy() for storing the results of the command line argument parsing with pointer copies from the corresponding strings in argv. This works because the argv array is guaranteed by the C99 standard to persist throughout the life of the program. This new approach uses less storage and executes faster. Thanks to Minh Quan Ho for recommending this change. - Renamed the IMP_STR cpp macro that gets defined on the command line, via the test/3/Makefile, to IMPL_STR. - Updated runme.sh to set the problem size ranges for single-threaded and multithreaded execution independently from one another, as well as on a per-system basis. - Added a 'quiet' variable to runme.sh that can easily toggle quiet mode for the test drivers' output. - Very minor typecast fix in call to bli_getopt() in bli_utils.c. - In bli_getopt(), changed the nextchar variable from being a local static variable to a field of the getopt_t state struct. (Not sure why it was ever declared static to begin with.) - Other minor changes to bli_getopt() to accommodate the rewritten test drivers' command line parsing needs. - (cherry picked from commit ee81efc) Allow test/3 drivers to use default ind_t method. (#804) Details: - Previously, the standalone performance drivers in test/3 were written under the assumption that the user would want to explicitly test either native execution *or* 1m. But because the accompanying runme.sh script defaults to passing "native" in for the -i command line option (which explicitly sets the induced method type), running the script without modification causes the test drivers to use slow reference microkernels on systems where native complex-domain microkernels are not registered -- which will yield poor performance for complex-domain level-3 operations. Furthermore, even if a user was aware of this, the test drivers did not support any single value for the -i option that would test BLIS using the library's default behavior -- that is, using 1m on systems where it is needed and native execution on systems that have native microkernels implemented and registered. - This commit addresses the aforementioned issue by supporting a new value for the -i option: "auto". The "auto" value causes the driver to avoid explicitly setting the induced method altogether, leaving BLIS's default behavior in place. This "auto" option is also now the default setting within the runme.sh script. Thanks to Leick Robinson for finding and reporting this issue. - Also added support for "nat" as a shorthand for "native", which the help text already (erroneously) claimed was supported. - (cherry picked from commit fd1a7e3) Use "-i auto" by default in test/3 drivers. Details: - Request default induced method behavior of BLIS via "-i auto" when running the standalone performance drivers in test/3 via the runme.sh script present in that directory. (Previously, the runme.sh script would use "-i native" by default.) This change was originally intended for fd1a7e3. - (cherry picked from commit cad5149)
fgvanzee
added a commit
that referenced
this pull request
Apr 30, 2024
Details: - Previously, the standalone performance drivers in test/3 were written under the assumption that the user would want to explicitly test either native execution *or* 1m. But because the accompanying runme.sh script defaults to passing "native" in for the -i command line option (which explicitly sets the induced method type), running the script without modification causes the test drivers to use slow reference microkernels on systems where native complex-domain microkernels are not registered -- which will yield poor performance for complex-domain level-3 operations. Furthermore, even if a user was aware of this, the test drivers did not support any single value for the -i option that would test BLIS using the library's default behavior -- that is, using 1m on systems where it is needed and native execution on systems that have native microkernels implemented and registered. - This commit addresses the aforementioned issue by supporting a new value for the -i option: "auto". The "auto" value causes the driver to avoid explicitly setting the induced method altogether, leaving BLIS's default behavior in place. This "auto" option is also now the default setting within the runme.sh script. Thanks to Leick Robinson for finding and reporting this issue. - Also added support for "nat" as a shorthand for "native", which the help text already (erroneously) claimed was supported. - (cherry picked from commit fd1a7e3)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Details:
test/3
were written under the assumption that the user would want to explicitly test either native execution or 1m. But because the accompanyingrunme.sh
script defaults to passing "native" in for the-i
command line option (which explicitly sets the induced method type), using it without modification causes the test drivers to use reference microkernels on systems where native complex-domain microkernels are not registered. Furthermore, even if a user was aware of this, the test drivers did not support any single value for the-i
option that would test BLIS using the library's default behavior -- that is, using 1m on systems where it is needed and native execution on systems that have native microkernels.-i
option:"auto"
. The"auto"
value causes the driver to avoid explicitly setting the induced method altogether, leaving BLIS's default behavior in place. This"auto"
option is now the default setting within therunme.sh
script. Thanks to Leick Robinson for finding and reporting this issue."nat"
as a shorthand for"native"
, which the help text already (erroneously) claimed was supported.