-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Update cmake to only build for present GPU #2955
[REVIEW] Update cmake to only build for present GPU #2955
Conversation
} | ||
") | ||
execute_process( | ||
COMMAND ${CMAKE_CUDA_COMPILER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change from the cuML version -- CUDA_NVCC_EXECUTABLE
doesn't seem to be defined in the cudf scripts, but CMAKE_CUDA_COMPILER
is. Any ideas why that is @mt-jones?
This is not directly related, but for a develop build, it might be useful to also include the verbose ptxas output option, so we can at least have an idea of register/smem usage of various kernels. |
I think that should be a separate option. Otherwise when debugging build warnings / failures there's more text to scroll through. |
AFAIK the last thing to do for this (aside from changelog) is update any release scripts that build the conda packages to ensure we build all the GPU architectures in the public releases -- @kkraus14 or @mike-wendt do either of you have a moment to point me to the files that'd need the new |
@trxcllnt if you make the default to build all architectures as it always has been then I think you will risk less surprise for non-developer users who have to build from source, and you won't have to change scripts for CI. Make this an opt-in flag for developers. |
@harrism good point, that seems reasonable. I'll make the flag default to "ALL", so the cmake options are still consistent with cuML's cmake script. |
I should mention local builds using the new architecture detection logic show warnings like this:
Is this an issue that needs to be addressed before merging, or is it safe to ignore these? |
Yeah I am not comfortable without understanding the cause of those warnings. Can you tell me more? What code is causing the warnings? |
Looks like the kernel launch parameters are out of bounds for my local arch/gpu? cudf/cpp/src/io/orc/dict_enc.cu Lines 613 to 615 in 2134804
|
Hmm The intent here is to limit register count to 40, and shared mem usage is 28KB. If compiling for a GPU with only 64K of shared mem (which I'm assuming is the case here), it could result in this warning. If the result is minctapersm being ignored, it sounds like it's safe to ignore it. If on the other hand we end up with ignoring maxregcount and compiling the kernel with more than 40 registers, that would result in a near-2x perf drop, so I'd like to avoid that, but I'm not aware of another way to limit register count without also creating a minctapersm. |
Actually, using {512, 2} also has the same intended effect of limiting regcount to 40 (at least on TU102), while just specifying {512} results in 43 registers (bad). It just looks like ptxas works a little bit harder if there is a maxregcount limit, regardless of the limit. |
Codecov Report
@@ Coverage Diff @@
## branch-0.11 #2955 +/- ##
===============================================
+ Coverage 86.9% 86.95% +0.04%
===============================================
Files 49 49
Lines 9187 9211 +24
===============================================
+ Hits 7984 8009 +25
+ Misses 1203 1202 -1
Continue to review full report at Codecov.
|
d75361e
to
3cc2315
Compare
@OlivierNV are you going to make that launch_bounds change? |
Is there a way to add a file to this PR from github ? (I tried to, but it weirdly created a new branch) |
Not that I know of. I still have to figure out how to fix mine anyway. |
@trxcllnt what GPU arch are you compiling for that is exposing this? |
@trxcllnt Not sure about 'ordered_scatter_to_tables', but for gpuInitDictionaryIndices, the {512, 3} can simply be changed to {512, 2} in cpp/src/io/orc/dict_enc.cu (though it might resurface on Tegra, I hear there are SKUs with only 32K shared mem, but as long as nobody exceeds the threads/block limit, it sounds like it's not fatal) |
@OlivierNV The only way to modify a PR is to modify it's branch, which is under trxcllnt's fork. That would require adding his repo as a remote, fetching that branch, making changes, a commit, pushing those changes to your own fork, and then creating a separate PR against trxcllnt's branch (assuming you don't have push access, which you won't, since it's not your fork). That's just for informational purposes. You might ask trxcllnt what he'd prefer. |
@harrism I saw these compiling on the RTX 8000s in my workstation. There are more warnings than those I listed, but elided them for brevity. I'll recompile again here in a bit and paste the full list. @OlivierNV the easiest way is to send a PR to this branch on my fork. |
@harrism @OlivierNV ok, here's the full set of warnings I'm seeing on the RTX 8000 (after pulling the latest changes from @OlivierNV). I think these are all caused by this same line in cudf/cpp/src/stream_compaction/copy_if.cuh Line 342 in 6069426
Click to expand warnings listptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS7_PKjSA_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIdNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIdNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIfNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIfNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIlNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIlNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIiNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIiNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIsNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIsNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIaNS_18valid_table_filterELi256ELb0EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN69_GLOBAL__N__45_tmpxft_0000296a_00000000_6_drop_nulls_cpp1_ii_9cbced0314scatter_kernelIaNS_18valid_table_filterELi256ELb1EEEvPT_PjPiPKS2_PKjS5_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb1ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb1ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb1ELb0EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb1ELb0EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEENS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS8_PKjSB_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIdNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIfNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIlNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIiNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIsNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb0ELb1EEELi256ELb0EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN77_GLOBAL__N__53_tmpxft_0000295d_00000000_6_apply_boolean_mask_cpp1_ii_a5002f4814scatter_kernelIaNS_19boolean_mask_filterILb0ELb1EEELi256ELb1EEEvPT_PjPiPKS3_PKjS6_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype13EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype11EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype10EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIlL9gdf_dtype9EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIiL9gdf_dtype8EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIN4cudf6detail7wrapperIaL9gdf_dtype7EEEZNS2_25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKSC_PKjSF_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIdZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIdZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIfZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIfZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIlZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIlZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIiZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIiZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIsZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIsZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIaZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb0EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN66_GLOBAL__N__42_tmpxft_00002fbf_00000000_6_scatter_cpp1_ii_d72f380914scatter_kernelIaZN4cudf6detail25ordered_scatter_to_tablesERKNS1_5tableEPKiiEUliE_Li256ELb1EEEvPT_PjPiPKS9_PKjSC_iiT0_ is out of range. .minnctapersm will be ignored |
I think it's really the assumption in this line:
The 2048 in (2048/block_size) assumes that 2048/block_size is the max number of threads per SM that can be resident. However, if there is any shared memory use (which can presumably be hidden in cub or thrust templates), it might be less than that. Or the arch itself could somehow be limited to 32 warps/SM ~ ideally we could query this in the pre-processor like the SM arch, but not sure there is anything like that. Changing this to 1024 / block_size would eliminate the warning, but could also negatively impact perf ~ on the other hand, blindly attempting to coerce ptxas into reaching max occupancy can be detrimental to perf, in which case removing the 2nd part of the __launch_bounds might very well improve perf (minctapersm is generally a case-by-case thing in my experience, and nobody is even looking at occupancy/register usage in cudf, so the ",2048/block_size" seems sketchy to me). |
I'll run the scatter benchmark with 2048 and 1024 numerators for block count and see what the diff is. If it's trivial or better perf in 1024, is that likely to translate to better perf on other architectures too? Alternatively, if 2048 is better, are we OK living with the |
@harrism this is your code, so you may want to take a look. |
Here are the results I'm seeing. I'm not familiar enough with the expected perf to know whether the variance in the numbers are conclusively better or worse: gbenchmarks/SCATTER_BENCH with block_count=2048/block_size2019-10-17 01:05:27 Running gbenchmarks/SCATTER_BENCH Run on (24 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x12) L1 Instruction 32K (x12) L2 Unified 1024K (x12) L3 Unified 19712K (x2) Load Average: 1.10, 1.01, 1.04 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------- benchmark/double_coalesce_x/1024/1/manual_time 49987 ns 70968 ns 13066 bytes_per_second=312.582M/s benchmark/double_coalesce_x/2048/1/manual_time 50374 ns 71290 ns 12867 bytes_per_second=620.364M/s benchmark/double_coalesce_x/4096/1/manual_time 50208 ns 71018 ns 12861 bytes_per_second=1.21564G/s benchmark/double_coalesce_x/8192/1/manual_time 50237 ns 70939 ns 12883 bytes_per_second=2.42989G/s benchmark/double_coalesce_x/16384/1/manual_time 50795 ns 71165 ns 12894 bytes_per_second=4.80642G/s benchmark/double_coalesce_x/32768/1/manual_time 51119 ns 71126 ns 12768 bytes_per_second=9.55178G/s benchmark/double_coalesce_x/65536/1/manual_time 52797 ns 71297 ns 11372 bytes_per_second=18.4966G/s benchmark/double_coalesce_x/131072/1/manual_time 53584 ns 71587 ns 12213 bytes_per_second=36.4498G/s benchmark/double_coalesce_x/262144/1/manual_time 64171 ns 82393 ns 10831 bytes_per_second=60.8721G/s benchmark/double_coalesce_x/524288/1/manual_time 102433 ns 218136 ns 7973 bytes_per_second=76.2694G/s benchmark/double_coalesce_x/1048576/1/manual_time 294898 ns 305420 ns 4965 bytes_per_second=52.9843G/s benchmark/double_coalesce_x/2097152/1/manual_time 412884 ns 415107 ns 2814 bytes_per_second=75.6872G/s benchmark/double_coalesce_x/4194304/1/manual_time 531340 ns 535555 ns 1000 bytes_per_second=117.627G/s benchmark/double_coalesce_x/8388608/1/manual_time 1191458 ns 1205765 ns 668 bytes_per_second=104.913G/s benchmark/double_coalesce_x/16777216/1/manual_time 2191371 ns 2181062 ns 349 bytes_per_second=114.084G/s Segmentation fault (core dumped) gbenchmarks/SCATTER_BENCH with block_count=1024/block_size2019-10-17 01:08:17 Running gbenchmarks/SCATTER_BENCH Run on (24 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x12) L1 Instruction 32K (x12) L2 Unified 1024K (x12) L3 Unified 19712K (x2) Load Average: 0.59, 0.89, 1.01 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------- benchmark/double_coalesce_x/1024/1/manual_time 46689 ns 66974 ns 14927 bytes_per_second=334.664M/s benchmark/double_coalesce_x/2048/1/manual_time 46796 ns 67194 ns 14810 bytes_per_second=667.799M/s benchmark/double_coalesce_x/4096/1/manual_time 46482 ns 66652 ns 13968 bytes_per_second=1.31309G/s benchmark/double_coalesce_x/8192/1/manual_time 47821 ns 67955 ns 13820 bytes_per_second=2.55267G/s benchmark/double_coalesce_x/16384/1/manual_time 47626 ns 67416 ns 13630 bytes_per_second=5.12617G/s benchmark/double_coalesce_x/32768/1/manual_time 48125 ns 67645 ns 13653 bytes_per_second=10.1461G/s benchmark/double_coalesce_x/65536/1/manual_time 49762 ns 67667 ns 13100 bytes_per_second=19.6245G/s benchmark/double_coalesce_x/131072/1/manual_time 50832 ns 68280 ns 13037 bytes_per_second=38.4231G/s benchmark/double_coalesce_x/262144/1/manual_time 64527 ns 82484 ns 10903 bytes_per_second=60.5371G/s benchmark/double_coalesce_x/524288/1/manual_time 100105 ns 204639 ns 7657 bytes_per_second=78.0434G/s benchmark/double_coalesce_x/1048576/1/manual_time 296899 ns 308353 ns 4881 bytes_per_second=52.6274G/s benchmark/double_coalesce_x/2097152/1/manual_time 414626 ns 418521 ns 2793 bytes_per_second=75.3691G/s benchmark/double_coalesce_x/4194304/1/manual_time 540594 ns 543664 ns 1000 bytes_per_second=115.613G/s benchmark/double_coalesce_x/8388608/1/manual_time 1169875 ns 1186101 ns 632 bytes_per_second=106.849G/s benchmark/double_coalesce_x/16777216/1/manual_time 2199527 ns 2193205 ns 350 bytes_per_second=113.661G/s Segmentation fault (core dumped) The benchmark eventually segfaults attempting to allocate all the memory on the device. It starts by allocating 1/2 up front, then grows in spikes from there. |
I would also run the benchmark removing the ",X/block_size", eg: launch_bounds(block_size) only, and let the compiler decide register allocation. |
gbenchmarks/SCATTER_BENCH no block_count2019-10-17 17:17:15 Running gbenchmarks/SCATTER_BENCH Run on (24 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x12) L1 Instruction 32K (x12) L2 Unified 1024K (x12) L3 Unified 19712K (x2) Load Average: 1.42, 0.55, 0.26 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------- benchmark/double_coalesce_x/1024/1/manual_time 47063 ns 67617 ns 13886 bytes_per_second=332.003M/s benchmark/double_coalesce_x/2048/1/manual_time 47516 ns 67979 ns 13808 bytes_per_second=657.672M/s benchmark/double_coalesce_x/4096/1/manual_time 53595 ns 74825 ns 12226 bytes_per_second=1.13882G/s benchmark/double_coalesce_x/8192/1/manual_time 47971 ns 68178 ns 13553 bytes_per_second=2.54469G/s benchmark/double_coalesce_x/16384/1/manual_time 48286 ns 68214 ns 13719 bytes_per_second=5.05618G/s benchmark/double_coalesce_x/32768/1/manual_time 48143 ns 67683 ns 13616 bytes_per_second=10.1422G/s benchmark/double_coalesce_x/65536/1/manual_time 56178 ns 74915 ns 13031 bytes_per_second=17.3833G/s benchmark/double_coalesce_x/131072/1/manual_time 51389 ns 68919 ns 12943 bytes_per_second=38.007G/s benchmark/double_coalesce_x/262144/1/manual_time 64576 ns 82285 ns 10955 bytes_per_second=60.4906G/s benchmark/double_coalesce_x/524288/1/manual_time 102052 ns 224054 ns 7945 bytes_per_second=76.5539G/s benchmark/double_coalesce_x/1048576/1/manual_time 296321 ns 317989 ns 4894 bytes_per_second=52.73G/s benchmark/double_coalesce_x/2097152/1/manual_time 395626 ns 409061 ns 2811 bytes_per_second=78.9888G/s benchmark/double_coalesce_x/4194304/1/manual_time 515526 ns 529331 ns 1000 bytes_per_second=121.235G/s benchmark/double_coalesce_x/8388608/1/manual_time 1227143 ns 1243755 ns 610 bytes_per_second=101.863G/s benchmark/double_coalesce_x/16777216/1/manual_time 2277550 ns 2295213 ns 344 bytes_per_second=109.767G/s Segmentation fault (core dumped) |
@OlivierNV maybe I'm not getting the full picture since the benchmark doesn't finish, but it doesn't seem like lowering or removing the second launch parameter significantly affects performance. If you agree, I'm happy to remove the parameter so we can move forward on merging this PR. |
Yup, I vote for taking out the parameter and merge. |
Out of the loop here, but I'm assuming we don't have to worry about the fact that those tests are segfault-ing due to memory constraints on Paul's machine even though he's running 48/192 GB of device/host ram, respectively? |
I'm not worried about it since they're just benchmarks. |
This PR partially addresses #2902, adding the same gpu-architecture detection logic as cuML.
I expect CI scripts will need to include
-DGPU_ARCHS=ALL
in the cmake args, but will need help to find them all cc: @kkraus14.