-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CUDA error checking/debugging #94
Conversation
add debug.cuh and related changes.
Guys, it's WIP, but review/advice are welcome. Todo:
|
k2/csrc/cuda/debug.cuh
Outdated
/** | ||
* \brief A static assertion | ||
* \param exp the compile-time boolean expression that must be true | ||
* \param msg an error message if exp is false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment format is not consistent with our conventions, we use @param
in other files and no \brief
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The \brief is OK with me as it's likely more correct. I am using a kind of retarded doxygen that I don't attempt to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx! I turn to learn about doxygen and will change to some suggesting conventions in the coming update.
k2/csrc/cuda/debug.cuh
Outdated
#elif (K2_PTX_ARCH >= 200) | ||
printf("CUDA error ID=%d, NAME=%s, [block (%d,%d,%d) thread (%d,%d,%d), %s, %d]: %s\n", | ||
error, cudaGetErrorName(error), | ||
blockIdx.z, blockIdx.y, blockIdx.x, threadIdx.z, threadIdx.y, threadIdx.x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's strange that the order is z, y, x
instead of x, y, z
. and can we add kernel name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, where this function is supposed to be called? It seems it will be called with return value (cudaError_t error
) from some cuda api, if it's not a kernel, how can we get threadIdx
and blockIdx
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's strange that the order is
z, y, x
instead ofx, y, z
. and can we add kernel name here?
As the CUDA thread/block indexes are column-major distributed. I.e. they are virtual concepts for developers. For the indexing of them, if it's print like (A, B, C)
, then it implies [a][b][c]
as a way to index them. Thus, according to the column-major, a=z, b=y, c=x
. However, I can't found a convention to log them. So, either way is ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, where this function is supposed to be called? It seems it will be called with return value (
cudaError_t error
) from some cuda api, if it's not a kernel, how can we getthreadIdx
andblockIdx
here?
Here, a cuda erro has raised. The error raises from either the host part or the kernel/device part of one cuda runtime-API or kernels.
The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0
; as to the kernel part, it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.
OK. I am writing gtests/examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the CUDA thread/block indexes are column-major distributed
IIRC,the index is row-major...anyway, I think (x,y,z) is more straightforward to most people, this is also used by most of cuda examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.
Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro K2_PTX_ARCH
? I did't find the related code in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.
Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro
K2_PTX_ARCH
? I did't find the related code in the PR.
It's defined in cuda/arch.h
with comments.
k2/csrc/cuda/debug_test.cu
Outdated
int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int))); | ||
// device call | ||
{ | ||
FillContents<<<2, 3>>>(5, unallocated_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test run successfully? It's weird that it can access host memory in kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
k2/csrc/cuda/debug.cuh
Outdated
#endif | ||
#endif | ||
|
||
} // end namespace k2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to our conventions, there's no end
, just } // namespace k2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend you installing clang-format
to format your code automatically on save.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmake/gtest_utils.cmake
Outdated
enable_testing() | ||
|
||
function(k2_add_gtest) | ||
cmake_parse_arguments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cmake code is very confusing, for me and I'm sure the majority of readers . Can you please add some kind of comment that's aimed at someone who doesn't really understand cmake? That explains what implications it has for usage, in terms of what flags you can pass or what behaviors it causes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe I can change another way to reuse the gtest function. But for now, as to add a CMake module doesn't break things, I would like to put it here for convenience.
k2/csrc/cuda/debug.cuh
Outdated
@@ -0,0 +1,169 @@ | |||
// k2/csrc/cuda/errors.h | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is that we're just calling files .h even if they may contain cuda code, because due to the structure of the library almost every file can potentially instantiate cuda code even if it's not explicitly present. I know this may seem odd, but for me it's convenient as my emacs recognizes as C++, and I think it will make the files seem less scary to people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
k2/csrc/cuda/debug.cuh
Outdated
|
||
/** | ||
* \brief Macro for checking cuda error. | ||
* If not cudaSuccess and K2_MAKE_ERROR_CHECK, print and exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more precise: "if K2_MAKE_ERROR_CHECK is defined and e is not cudaSuccess"... and clarify that it does nothing otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmake/gtest_utils.cmake
Outdated
endfunction() | ||
|
||
# directly give the suite name source files, | ||
function(k2_add_gtests test_set_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the following template
https://github.com/danpovey/k2/blob/2a6ba24b4a7845bab757fbf7044b0d6253bcb2d8/k2/csrc/CMakeLists.txt#L29-L41
not good enough that you want to make your own?
I find this cmake script hard to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not quite readable. My motivation was to make this function reusable (and could keep gtest-compile out of make
process, e.g. only compile and run if make test
.). As simply put the above codes into a CMake module, it still requires the sources and dep-libs. If you will, please help make this function reusable?
k2/csrc/cuda/debug.cuh
Outdated
#else | ||
// A hack to implement the variadic printf as clang for other c-compilers, | ||
// and sielence the warning | ||
#pragma clang diagnostic ignored "-Wc++11-extensions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a user does not use clang?
What are the purposes of these two pragmas?
If they are really needed, please wrap them with pragma push and pop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake with the comments. This pragma
is for clang, and you are right, a push and pop guard is added.
k2/csrc/cuda/debug.cuh
Outdated
* https://github.com/NVlabs/cub/blob/c3cceac115c072fb63df1836ff46d8c60d9eb304/cub/util_debug.cuh#L109 | ||
*/ | ||
#if !defined(K2_DLOG) | ||
#if !(defined(__clang__) && defined(__CUDA__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation looks weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂 Please give more details. As to me, the auto-format of my IDE had aligned the macros but it made it less readable i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I know what you mean. An indent error here is fixed now.
k2/csrc/cuda/debug.cuh
Outdated
* | ||
* \return The CUDA error. | ||
*/ | ||
__host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function K2_CUDA_DEBUG
is already inside the namespace k2
, so there is
no need to put a k2
in it.
K2_CUDA_DEBUG
should be a macro name, not a function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K2_CUDA_DEBUG should be a macro name, not a function name
I agree with @csukuangfj here , it's better if we can change K2_CUDA_DEBUG
to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug
, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR
) to provide __FILE__
and __LINE__
. I suppose function K2_CUDA_DEBUG
will not be called directly by users.
BTW, I'm in favour of macros with K2_
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void CudaDebug(....) {....}
#define K2_CUDA_DEBUG(...) CudaDebug(...)
This is what I meant.
It's fine to have K2
in the macro, but I don't think it is needed in a function which is inside the namespace k2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function
K2_CUDA_DEBUG
is already inside the namespacek2
, so there is
no need to put ak2
in it.
I am not very certain. But if it gets called out of the k2
namespace.. as it's an inline function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K2_CUDA_DEBUG should be a macro name, not a function name
I agree with @csukuangfj here , it's better if we can change
K2_CUDA_DEBUG
to follow name convention for functions (as it's a function instead of macro), e.g.CudaDebug
, as you have defined macros below (e.g.K2_CUDA_CHECK_ERROR
) to provide__FILE__
and__LINE__
. I suppose functionK2_CUDA_DEBUG
will not be called directly by users.BTW, I'm in favour of macros with
K2_
prefix.
The inline function name changed from "K2_CUDA_DEBUG" to "_K2DebugLog", "_" indicates it's an inner func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All functions in the namespace k2, except this one, do not contain k2
, so what is the reason for adding k2 to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All functions in the namespace k2, except this one, do not contain
k2
, so what is the reason for adding k2 to this function?
I know what you mean, but you may not try to get my point. For example:
#include <iostream>
namespace k2 {
inline void Print() {
std::cout << "Hello World" << std::endl;
}
#define PP Print()
}
int main () {
PP;
return 0;
}
compile and get error:
note: 'k2::Print' declared here
inline void Print() {
^
1 error generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use
#define PP k2::Print()
The compiler gives a hint that you can use k2::Print
in the macro.
Mm, I'm OK with leaving the name as K2_CUDA_DEBUG just to follow the
pattern of other things like that, which are macros.
…On Wed, Aug 26, 2020 at 6:27 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/cuda/debug.cuh
<https://github.com/danpovey/k2/pull/94#discussion_r477198638>:
> + inline __host__ __device__ void va_printf(char const* format, Args const&... args) {
+ #ifdef __CUDA_ARCH__
+ printf(format, blockIdx.z, blockIdx.y, blockIdx.x, threadIdx.z, threadIdx.y, threadIdx.x, args...);
+ #else
+ printf(format, args...);
+ #endif
+ }
+ #ifndef __CUDA_ARCH__
+ #define K2_DLOG(format, ...) va_printf(format,__VA_ARGS__)
+ #else
+ #define K2_DLOG(format, ...) va_printf("[block (%d,%d,%d), thread (%d,%d,%d)]: " format, __VA_ARGS__)
+ #endif
+ #endif
+#endif
+
+} // end namespace k2
I would recommend you installing clang-format to format your code
automatically on save.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/94#discussion_r477198638>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2QZY43ODIB7EBFF2TSCTPSNANCNFSM4QLPYQAA>
.
|
k2/csrc/cuda/debug.cuh
Outdated
|
||
// See LICENSE for clarification regarding multiple authors | ||
|
||
#ifndef K2_CSRC_CUDA_ERRORS_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run the cpplint script in the scripts
folder before committing.
It can help you spot the incorrect header guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I change it manually this time. And will find a way to make it take effect for my IDE.
OK.
…On Wed, Aug 26, 2020 at 6:49 PM Haowen Qiu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/cuda/debug.cuh
<https://github.com/danpovey/k2/pull/94#discussion_r477209444>:
> +#ifndef K2_STATIC_ASSERT
+ #define K2_STATIC_ASSERT(exp, msg) static_assert(exp, msg)
+#endif
+
+/**
+ * \brief This is a error checking function, with context information.
+ * If K2_MAKE_ERROR_CHECK is defined and error is not cudaSuccess,
+ * the corresponding error message is printed to:
+ * - host: stderr
+ * - device: stdout in device
+ * along with the supplied source context.
+ * It's used to make other macros convenient.
+ *
+ * \return The CUDA error.
+ */
+__host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG(
K2_CUDA_DEBUG should be a macro name, not a function name
I agree with @csukuangfj <https://github.com/csukuangfj> here , it's
better if we can change K2_CUDA_DEBUG to follow name convention for
functions (as it's a function instead of macro), e.g. CudaDebug, as you
have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__
and __LINE__. I suppose function K2_CUDA_DEBUG will not be called
directly by users.
BTW, I'm in favour of macros with K2_ prefix.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/94#discussion_r477209444>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO7E6JAHX4OKZ23BSS3SCTSFFANCNFSM4QLPYQAA>
.
|
k2/csrc/cuda/debug.h
Outdated
/** | ||
* @def K2_DLOG | ||
* | ||
* @brief Log macro for printf statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify how it is supposed to be used with an example!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please see the gtest about it. Though I'm afraid �it's not as good as expected, I need to know the exactly expected usages so I can make changes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songmeixu please make sure there is an example in the documentation! People won't know to look in the test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the usage example is put @code tag as doxygen suggests.
k2/csrc/cuda/debug_test.cu
Outdated
int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int))); | ||
// device call K2_DLOG | ||
{ | ||
helloCUDA<<<1, 5>>>(1.2345f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent capitalization here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
k2/csrc/cuda/debug_test.cu
Outdated
/* print out 100 evenly spaced square roots just to see that it worked */ | ||
unsigned int i; | ||
for (i = 0; i < N; i += (N / 100)) { | ||
K2_DLOG("sqrt(%d) = %lf\n", i, roots[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something just like assert(sqrt(i) == roots[i])
is better than print values here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtest EXPECT_EQ
is now used to check this.
{ | ||
HelloCUDA<<<1, 5>>>(1.2345f); | ||
FillContents<<<2, 3>>>(5, unallocated_array); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
free(unallocated_array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use std::array
or std::vector
? They manage the memory for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this uses unified memory... is this on by default, and does it impose a constraint on compute capability?
I had no plans to make use of unified memory, actually. but OK for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's not unified memory, unified memory should be allocated with cudaMallocManage
. BTW, I think we should not use unified memory even in test code, as for our code, all memory should be allocated by using Context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is to use std::array
or std::vector
to replace malloc
. If it needs to be accessed by GPU,
we can use cudaMalloc
and cudaMemcpy
to make it available to GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
free(unallocated_array)
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is to use
std::array
orstd::vector
to replacemalloc
. If it needs to be accessed by GPU,
we can usecudaMalloc
andcudaMemcpy
to make it available to GPU.
I see. As the device code like C-style data argument, and it's not an important thing.. So, I add the missing free
and keep C-style here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether style
is an important thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂 That what I want to say. The thing is you wanna me change to the other style... and you know it's not an important thing.
So how does the CUDA code successfully write to the CPU memory?
…On Fri, Aug 28, 2020 at 9:07 AM Haowen Qiu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/cuda/debug_test.cu
<https://github.com/danpovey/k2/pull/94#discussion_r478775748>:
> + for (i = 0; i < N; i += (N / 100)) {
+ K2_DLOG("sqrt(%d) = %lf\n", i, roots[i]);
+ }
+ }
+
+ /* free the CPU memory */
+ free(roots);
+}
+
+TEST(DebugTest, K2DLog) {
+ int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int)));
+ // device call K2_DLOG
+ {
+ HelloCUDA<<<1, 5>>>(1.2345f);
+ FillContents<<<2, 3>>>(5, unallocated_array);
+ }
Actually it's not unified memory, unified memory should be allocated with
cudaMallocManage. BTW, I think we should not use unified memory even in
test code, as for our code, all memory should be allocated by using
Context.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/94#discussion_r478775748>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO77F4KQVT6YG322FRDSC37LHANCNFSM4QLPYQAA>
.
|
As I comment above, I think the current code can not be run successfully as kernel tries to access host memory directly, I guess this is only WIP test code. Maybe @songmeixu is the best one to answer this. |
k2/csrc/cuda/debug.h
Outdated
@@ -176,7 +176,7 @@ __host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG( | |||
* | |||
* @remark | |||
* macro `__VA_ARGS__` is used to pass the kernel<<<...>>> as one argument, | |||
* otherwise the compiler rise a error "passed 2 arguments, but takes just 1". | |||
* otherwise the compiler rises a error "passed 2 arguments, but takes just 1". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raises an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
and update for last review; make some trival changes
k2/csrc/cuda/debug.h
Outdated
*/ | ||
#if !defined(K2_DLOG) | ||
#if !(defined(__clang__) && defined(__CUDA__)) | ||
#if (K2_PTX_ARCH == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here can just do #ifndef CUDA_ARCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
k2/csrc/cuda/debug.h
Outdated
#if !(defined(__clang__) && defined(__CUDA__)) | ||
#if (K2_PTX_ARCH == 0) | ||
#define K2_DLOG(format, ...) printf(format,__VA_ARGS__) | ||
#elif (K2_PTX_ARCH >= 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here:
#if (defined(__CUDA_ARCH__ ) && (__CUDA_ARCH__ >= 200)
(the c preprocessor si OK with this). so no need for K2_PTX_ARCH. Keep it simple!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma (I'm trying to follow Google C++ style generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
k2/csrc/cuda/debug.h
Outdated
* @details If in NDEBUG mode, all macros here would do nothing. | ||
*/ | ||
#if (!defined(NDEBUG) && !defined(K2_MAKE_ERROR_CHECK)) | ||
#define K2_MAKE_ERROR_CHECK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..remove this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- remove macro redefine guards - replace K2_MAKE_ERROR_CHECK with NDEBUG
According to the above discussion, I get a core principle, that to keep codes simple or minimal for usage. So I made some changes as above. BTW, I think the comments are too lot. So I am going to delete much of them, especially note/remark that actually are general knowledge or wordy. |
Just want to make sure that any C++ code checked into K2 is of high quality. If you have read the comment, you can find that a large part of them are about code style. Not sure whether you have
To get things done is one thing, readability is also very important. You'll make the later maintainer cry. I've read the code from https://github.com/XiaoMi/mace and I guess people from Xiaomi also care about code readability |
I would be grateful if someone can take his/her time to review my code. |
as I tried with clang with cuda, and finally the previous error dismissed: "function not viable: requires 2 arguments, but 4 were provided extern int vprintf (const char *__restrict __format, _G_va_list __arg);"
@@ -56,7 +56,7 @@ __global__ void HelloCUDA(float f) { | |||
} | |||
|
|||
// A vector add kernel definition | |||
__global__ void VecAdd(float* A, float* B, float* C) { | |||
__global__ void VecAdd(float *A, float *B, float *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that you have deleted the comment about adding const
for the two input arguments A
and B
.
The signature of this function tells us A
and B
will be changed inside the function,
but actually they are not. So it is really confusing for such a signature without const
.
I would give a -1 vote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, should use const where possible.
Also please add the extra checking macros like K2_CHECK_GE and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't delete things or IDK what you mean. The comment of issue or code? as to the issue it's still above.. As to the code, IDK what comments. Sorry.
k2/csrc/cuda/debug.h
Outdated
* @endcode | ||
*/ | ||
#ifndef __CUDA_ARCH__ | ||
#define K2_DLOG(format, ...) printf(format,__VA_ARGS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL.
I do read the K2 code and google style is always my favorite. I made style check as pre-commit/push requirement for one branch about K2 called cppyy in my fork. And I know code review is valuable. I appreciate it. And as many company code repos, the serious code cannot get pass w/o +2 vote after automatic style checking. However, you must know my point: I don't feel the same standard is work here for everyone every commits. I don't find the master branch is all in right and consistent style, isn't it? And in the case that C-styles or Cpp-style, why I need to change to the other way? And even the static checker wouldn't remind the const thing for c-style. To be perfect, won't However, I am definitely the guy pursues the clean code (I dare say I know the most tools that help), but with the least man effort and strong requirement from tools to get the goal, instead of all by man write and check. As I planed, I should already be working on cppyy and made it help us at this weekend. But I took much time to be a perfect coder. I apology for my words. But I insist on tools instead of man should take these work. SORRY to be rude. You would know I am usually a kind man, just sometimes I know there is a better way then I cannot step the road underfoot. |
Sorry to be some rube. @csukuangfj. Found I was kind of unsatisfying with myself's progress. Feel not right now, would hug you next meet. Let's look ahead. Please keep your way, you're right. Thx!! |
Not everything from NVIDIA is that right.
I think you have the ability to change the demo code to your needs instead
of just copying and pasting.
…On Mon, Aug 31, 2020 at 2:18 AM Meixu Song ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/cuda/debug_test.cu
<https://github.com/danpovey/k2/pull/94#discussion_r479800081>:
> @@ -56,7 +56,7 @@ __global__ void HelloCUDA(float f) {
}
// A vector add kernel definition
-__global__ void VecAdd(float* A, float* B, float* C) {
+__global__ void VecAdd(float *A, float *B, float *C) {
I didn't delete things or IDK what you mean. The comment of issue or code,
as to the issue it's still above.. As to the code, I post the page of doc:
[image: image]
<https://user-images.githubusercontent.com/705582/91666541-292d1780-eb30-11ea-9ecc-8209b7f2ad79.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/94#discussion_r479800081>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKIPDFMYAAC2QSJ3WRXILSDKJW3ANCNFSM4QLPYQAA>
.
|
* @endcode | ||
*/ | ||
#define K2_CUDA_CHECK_ERROR(e) \ | ||
if (::k2::_K2CudaDebug((cudaError_t)(e), __FILE__, __LINE__)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will evaluate e twice! Not good!
#include <device_launch_parameters.h> | ||
#include <driver_types.h> | ||
|
||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use cstdio
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks! Merging. |
1 similar comment
Thanks! Merging. |
#include <driver_types.h> | ||
|
||
#include <stdio.h> | ||
#include <assert.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this header file from cuda? or from c++ library <cassert>
? I know there's an assert
function in cuda header file, but I don't know how nvcc
decides which one to use.
Meixu, I just noticed this is to the wrong branch! |
* Cuda draft2 (#86) * Some code drafts * Updates to determinize draft.. * Progress on draft of CUDA stuff (sorry: lots of inconsistencies still) * Rework context/memory management for easier integration with external toolkits * Updates to Context * Add ragged.h * Add shape.h * Various fixes, added compose.cc * implement GetContext with parameter pack (#73) * Various fixes etc. * Update license; various updates on cuda stuff * Add utils header * Update the build system to support Cuda. (#75) * update the build system to support cuda. * add a google colab example for Cuda test. * enable CI for the cuda_draft branch. * resolve some comments. * Updates to compose algo. * Fix couple build issues * move shape to ragged_shape * More progress... * More progress... won't compile * More progress on CUDA draft (wont compile, sorry) * Working on composition... * matrix transpose in cuda (simple version) (#84) * more progress... * add performace test functions (#85) * various progress... * fix build issues (#87) Co-authored-by: Haowen Qiu <[email protected]> Co-authored-by: Fangjun Kuang <[email protected]> * cmake: set "Debug" as default, and add "-O0" for debugging (#83) * cmake: set "Debug" as default, and add "-O0" for debugging * make cmakelists.txt more pratical * change style of cmake custom variables * [WIP] CUDA error checking/debugging (#94) * [WIP] CUDA error checking/debugging add debug.cuh and related changes. * update #1 after review * update * update * trivials * replace get_filename_component, as it rises cmake version warning * add `K2_ASSERT, K2_CHECK_EQ, K2_PARANOID_ASSERT` and update for last review; make some trival changes * typo * trivals * fix a error * typo * Update debug.h * make last change consist with its comment * update to review * update to review * update as Dan required - remove macro redefine guards - replace K2_MAKE_ERROR_CHECK with NDEBUG * changes make gtest stop warn about usage of *_DEATH * remove utility macro about archtecture and the related file. * remove more complex less helpful cmake gtest module, and revert back * fix pointer pos style * continue to remove gtest util * rm special variadic printf for clang as I tried with clang with cuda, and finally the previous error dismissed: "function not viable: requires 2 arguments, but 4 were provided extern int vprintf (const char *__restrict __format, _G_va_list __arg);" * K2_CUDA_SAFE_CALL Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL. * fix to review * update to review - fix a twice error code checking issue - change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect - add a optional arg `abort` to control abort if hit error - trivals fix * just revert name: `K2_CUDA_CHECK_ERROR` * update to review (#103) * update to review - fix a twice error code checking issue - change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect - add a optional arg `abort` to control abort if hit error - trivals fix * just revert name: `K2_CUDA_CHECK_ERROR` * rm two files that @dan deleted on master Co-authored-by: Daniel Povey <[email protected]> Co-authored-by: Haowen Qiu <[email protected]> Co-authored-by: Fangjun Kuang <[email protected]>
add debug.cuh and related changes.