[WIP] CUDA error checking/debugging #94

megazone87 · 2020-08-26T07:22:27Z

add debug.cuh and related changes.

megazone87 · 2020-08-26T07:28:37Z

Guys, it's WIP, but review/advice are welcome.

Todo:

write gtest/usages for this
replace current CUDA error checks (glog, not functional)

qindazhu · 2020-08-26T08:41:38Z

k2/csrc/cuda/debug.cuh

+/**
+ * \brief A static assertion
+ * \param exp the compile-time boolean expression that must be true
+ * \param msg an error message if exp is false


The comment format is not consistent with our conventions, we use @param in other files and no \brief

The \brief is OK with me as it's likely more correct. I am using a kind of retarded doxygen that I don't attempt to compile.

Thx! I turn to learn about doxygen and will change to some suggesting conventions in the coming update.

qindazhu · 2020-08-26T08:44:28Z

k2/csrc/cuda/debug.cuh

+  #elif (K2_PTX_ARCH >= 200)
+    printf("CUDA error ID=%d, NAME=%s, [block (%d,%d,%d) thread (%d,%d,%d), %s, %d]: %s\n",
+           error, cudaGetErrorName(error),
+           blockIdx.z, blockIdx.y, blockIdx.x, threadIdx.z, threadIdx.y, threadIdx.x,


it's strange that the order is z, y, x instead of x, y, z. and can we add kernel name here?

BTW, where this function is supposed to be called? It seems it will be called with return value (cudaError_t error) from some cuda api, if it's not a kernel, how can we get threadIdx and blockIdx here?

Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.

it's strange that the order is z, y, x instead of x, y, z. and can we add kernel name here?

As the CUDA thread/block indexes are column-major distributed. I.e. they are virtual concepts for developers. For the indexing of them, if it's print like (A, B, C), then it implies [a][b][c] as a way to index them. Thus, according to the column-major, a=z, b=y, c=x. However, I can't found a convention to log them. So, either way is ok to me.

BTW, where this function is supposed to be called? It seems it will be called with return value (cudaError_t error) from some cuda api, if it's not a kernel, how can we get threadIdx and blockIdx here?

Here, a cuda erro has raised. The error raises from either the host part or the kernel/device part of one cuda runtime-API or kernels.
The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.

OK. I am writing gtests/examples.

As the CUDA thread/block indexes are column-major distributed

IIRC，the index is row-major...anyway, I think (x,y,z) is more straightforward to most people, this is also used by most of cuda examples

The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro K2_PTX_ARCH ? I did't find the related code in the PR.

The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro K2_PTX_ARCH ? I did't find the related code in the PR.

It's defined in cuda/arch.h with comments.

qindazhu · 2020-08-26T09:06:54Z

k2/csrc/cuda/debug_test.cu

+  int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int)));
+  // device call
+  {
+    FillContents<<<2, 3>>>(5, unallocated_array);


Can this test run successfully? It's weird that it can access host memory in kernel.

qindazhu · 2020-08-26T09:09:01Z

k2/csrc/cuda/debug.cuh

+  #endif
+#endif
+
+}  // end namespace k2


according to our conventions, there's no end, just } // namespace k2

I would recommend you installing clang-format to format your code automatically on save.

danpovey · 2020-08-26T09:18:31Z

cmake/gtest_utils.cmake

+enable_testing()
+
+function(k2_add_gtest)
+  cmake_parse_arguments(


This cmake code is very confusing, for me and I'm sure the majority of readers . Can you please add some kind of comment that's aimed at someone who doesn't really understand cmake? That explains what implications it has for usage, in terms of what flags you can pass or what behaviors it causes?

Hmm, maybe I can change another way to reuse the gtest function. But for now, as to add a CMake module doesn't break things, I would like to put it here for convenience.

danpovey · 2020-08-26T09:23:30Z

k2/csrc/cuda/debug.cuh

@@ -0,0 +1,169 @@
+// k2/csrc/cuda/errors.h
+


The convention is that we're just calling files .h even if they may contain cuda code, because due to the structure of the library almost every file can potentially instantiate cuda code even if it's not explicitly present. I know this may seem odd, but for me it's convenient as my emacs recognizes as C++, and I think it will make the files seem less scary to people.

danpovey · 2020-08-26T09:42:18Z

k2/csrc/cuda/debug.cuh

+
+/**
+ * \brief Macro for checking cuda error.
+ * If not cudaSuccess and K2_MAKE_ERROR_CHECK, print and exit.


Be more precise: "if K2_MAKE_ERROR_CHECK is defined and e is not cudaSuccess"... and clarify that it does nothing otherwise.

csukuangfj · 2020-08-26T10:05:31Z

cmake/gtest_utils.cmake

+endfunction()
+
+# directly give the suite name source files,
+function(k2_add_gtests test_set_name)


Is the following template
https://github.com/danpovey/k2/blob/2a6ba24b4a7845bab757fbf7044b0d6253bcb2d8/k2/csrc/CMakeLists.txt#L29-L41
not good enough that you want to make your own?

I find this cmake script hard to read and understand.

I agree it's not quite readable. My motivation was to make this function reusable (and could keep gtest-compile out of make process, e.g. only compile and run if make test.). As simply put the above codes into a CMake module, it still requires the sources and dep-libs. If you will, please help make this function reusable?

csukuangfj · 2020-08-26T10:13:15Z

k2/csrc/cuda/debug.cuh

+  #else
+    // A hack to implement the variadic printf as clang for other c-compilers,
+    // and sielence the warning
+    #pragma clang diagnostic ignored "-Wc++11-extensions"


What happens if a user does not use clang?

What are the purposes of these two pragmas?

If they are really needed, please wrap them with pragma push and pop.

I made a mistake with the comments. This pragma is for clang, and you are right, a push and pop guard is added.

csukuangfj · 2020-08-26T10:19:03Z

k2/csrc/cuda/debug.cuh

+ * https://github.com/NVlabs/cub/blob/c3cceac115c072fb63df1836ff46d8c60d9eb304/cub/util_debug.cuh#L109
+ */
+#if !defined(K2_DLOG)
+  #if !(defined(__clang__) && defined(__CUDA__))


The indentation looks weird.

😂 Please give more details. As to me, the auto-format of my IDE had aligned the macros but it made it less readable i think.

OK, I think I know what you mean. An indent error here is fixed now.

csukuangfj · 2020-08-26T10:24:13Z

k2/csrc/cuda/debug.cuh

+ *
+ * \return The CUDA error.
+ */
+__host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG(


The function K2_CUDA_DEBUG is already inside the namespace k2, so there is
no need to put a k2 in it.

K2_CUDA_DEBUG should be a macro name, not a function name.

K2_CUDA_DEBUG should be a macro name, not a function name

I agree with @csukuangfj here , it's better if we can change K2_CUDA_DEBUG to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__ and __LINE__. I suppose function K2_CUDA_DEBUG will not be called directly by users.

BTW, I'm in favour of macros with K2_ prefix.

void CudaDebug(....) {....} #define K2_CUDA_DEBUG(...) CudaDebug(...)

This is what I meant.

It's fine to have K2 in the macro, but I don't think it is needed in a function which is inside the namespace k2.

The function K2_CUDA_DEBUG is already inside the namespace k2, so there is
no need to put a k2 in it.

I am not very certain. But if it gets called out of the k2 namespace.. as it's an inline function.

K2_CUDA_DEBUG should be a macro name, not a function name

I agree with @csukuangfj here , it's better if we can change K2_CUDA_DEBUG to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__ and __LINE__. I suppose function K2_CUDA_DEBUG will not be called directly by users.

BTW, I'm in favour of macros with K2_ prefix.

The inline function name changed from "K2_CUDA_DEBUG" to "_K2DebugLog", "_" indicates it's an inner func.

All functions in the namespace k2, except this one, do not contain k2, so what is the reason for adding k2 to this function?

All functions in the namespace k2, except this one, do not contain k2, so what is the reason for adding k2 to this function?

I know what you mean, but you may not try to get my point. For example:

#include <iostream> namespace k2 { inline void Print() { std::cout << "Hello World" << std::endl; } #define PP Print() } int main () { PP; return 0; }

compile and get error:

note: 'k2::Print' declared here inline void Print() { ^ 1 error generated.

Could you use

#define PP k2::Print()

The compiler gives a hint that you can use k2::Print in the macro.

danpovey · 2020-08-26T10:29:42Z

Mm, I'm OK with leaving the name as K2_CUDA_DEBUG just to follow the pattern of other things like that, which are macros.

…

On Wed, Aug 26, 2020 at 6:27 PM Fangjun Kuang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In k2/csrc/cuda/debug.cuh <https://github.com/danpovey/k2/pull/94#discussion_r477198638>: > + inline __host__ __device__ void va_printf(char const* format, Args const&... args) { + #ifdef __CUDA_ARCH__ + printf(format, blockIdx.z, blockIdx.y, blockIdx.x, threadIdx.z, threadIdx.y, threadIdx.x, args...); + #else + printf(format, args...); + #endif + } + #ifndef __CUDA_ARCH__ + #define K2_DLOG(format, ...) va_printf(format,__VA_ARGS__) + #else + #define K2_DLOG(format, ...) va_printf("[block (%d,%d,%d), thread (%d,%d,%d)]: " format, __VA_ARGS__) + #endif + #endif +#endif + +} // end namespace k2 I would recommend you installing clang-format to format your code automatically on save. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/danpovey/k2/pull/94#discussion_r477198638>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO2QZY43ODIB7EBFF2TSCTPSNANCNFSM4QLPYQAA> .

csukuangfj · 2020-08-26T10:32:37Z

k2/csrc/cuda/debug.cuh

+
+// See LICENSE for clarification regarding multiple authors
+
+#ifndef K2_CSRC_CUDA_ERRORS_H_


Please run the cpplint script in the scripts folder before committing.
It can help you spot the incorrect header guard.

OK. I change it manually this time. And will find a way to make it take effect for my IDE.

danpovey · 2020-08-26T10:50:59Z

OK.

…

On Wed, Aug 26, 2020 at 6:49 PM Haowen Qiu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In k2/csrc/cuda/debug.cuh <https://github.com/danpovey/k2/pull/94#discussion_r477209444>: > +#ifndef K2_STATIC_ASSERT + #define K2_STATIC_ASSERT(exp, msg) static_assert(exp, msg) +#endif + +/** + * \brief This is a error checking function, with context information. + * If K2_MAKE_ERROR_CHECK is defined and error is not cudaSuccess, + * the corresponding error message is printed to: + * - host: stderr + * - device: stdout in device + * along with the supplied source context. + * It's used to make other macros convenient. + * + * \return The CUDA error. + */ +__host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG( K2_CUDA_DEBUG should be a macro name, not a function name I agree with @csukuangfj <https://github.com/csukuangfj> here , it's better if we can change K2_CUDA_DEBUG to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__ and __LINE__. I suppose function K2_CUDA_DEBUG will not be called directly by users. BTW, I'm in favour of macros with K2_ prefix. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/danpovey/k2/pull/94#discussion_r477209444>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO7E6JAHX4OKZ23BSS3SCTSFFANCNFSM4QLPYQAA> .

danpovey · 2020-08-27T06:38:26Z

k2/csrc/cuda/debug.h

+/**
+ * @def K2_DLOG
+ *
+ * @brief Log macro for printf statements.


Please clarify how it is supposed to be used with an example!

Updated. Please see the gtest about it. Though I'm afraid �it's not as good as expected, I need to know the exactly expected usages so I can make changes accordingly.

@songmeixu please make sure there is an example in the documentation! People won't know to look in the test file.

Now, the usage example is put @code tag as doxygen suggests.

danpovey · 2020-08-27T06:43:15Z

k2/csrc/cuda/debug_test.cu

+  int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int)));
+  // device call K2_DLOG
+  {
+    helloCUDA<<<1, 5>>>(1.2345f);


Inconsistent capitalization here

qindazhu · 2020-08-27T08:37:26Z

k2/csrc/cuda/debug_test.cu

+    /* print out 100 evenly spaced square roots just to see that it worked */
+    unsigned int i;
+    for (i = 0; i < N; i += (N / 100)) {
+      K2_DLOG("sqrt(%d) = %lf\n", i, roots[i]);


maybe something just like assert(sqrt(i) == roots[i]) is better than print values here

gtest EXPECT_EQ is now used to check this.

qindazhu · 2020-08-27T08:53:48Z

k2/csrc/cuda/debug_test.cu

+  {
+    HelloCUDA<<<1, 5>>>(1.2345f);
+    FillContents<<<2, 3>>>(5, unallocated_array);
+  }


free(unallocated_array)

Why not to use std::array or std::vector? They manage the memory for you.

I guess this uses unified memory... is this on by default, and does it impose a constraint on compute capability?
I had no plans to make use of unified memory, actually. but OK for testing.

Actually it's not unified memory, unified memory should be allocated with cudaMallocManage. BTW, I think we should not use unified memory even in test code, as for our code, all memory should be allocated by using Context.

What I meant is to use std::array or std::vector to replace malloc. If it needs to be accessed by GPU,
we can use cudaMalloc and cudaMemcpy to make it available to GPU.

free(unallocated_array)

Fixed.

What I meant is to use std::array or std::vector to replace malloc. If it needs to be accessed by GPU,
we can use cudaMalloc and cudaMemcpy to make it available to GPU.

I see. As the device code like C-style data argument, and it's not an important thing.. So, I add the missing free and keep C-style here.

Not sure whether style is an important thing.

😂 That what I want to say. The thing is you wanna me change to the other style... and you know it's not an important thing.

danpovey · 2020-08-28T03:39:56Z

So how does the CUDA code successfully write to the CPU memory?

…

On Fri, Aug 28, 2020 at 9:07 AM Haowen Qiu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In k2/csrc/cuda/debug_test.cu <https://github.com/danpovey/k2/pull/94#discussion_r478775748>: > + for (i = 0; i < N; i += (N / 100)) { + K2_DLOG("sqrt(%d) = %lf\n", i, roots[i]); + } + } + + /* free the CPU memory */ + free(roots); +} + +TEST(DebugTest, K2DLog) { + int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int))); + // device call K2_DLOG + { + HelloCUDA<<<1, 5>>>(1.2345f); + FillContents<<<2, 3>>>(5, unallocated_array); + } Actually it's not unified memory, unified memory should be allocated with cudaMallocManage. BTW, I think we should not use unified memory even in test code, as for our code, all memory should be allocated by using Context. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/danpovey/k2/pull/94#discussion_r478775748>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO77F4KQVT6YG322FRDSC37LHANCNFSM4QLPYQAA> .

qindazhu · 2020-08-28T03:49:26Z

So how does the CUDA code successfully write to the CPU memory?

As I comment above, I think the current code can not be run successfully as kernel tries to access host memory directly, I guess this is only WIP test code. Maybe @songmeixu is the best one to answer this.

csukuangfj · 2020-08-28T07:39:50Z

k2/csrc/cuda/debug.h

@@ -176,7 +176,7 @@ __host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG(
 *
 * @remark
 *  macro `__VA_ARGS__` is used to pass the kernel<<<...>>> as one argument,
- *  otherwise the compiler rise a error "passed 2 arguments, but takes just 1".
+ *  otherwise the compiler rises a error "passed 2 arguments, but takes just 1".


raises an error

and update for last review; make some trival changes

danpovey · 2020-08-30T09:44:40Z

k2/csrc/cuda/debug.h

+ */
+#if !defined(K2_DLOG)
+  #if !(defined(__clang__) && defined(__CUDA__))
+    #if (K2_PTX_ARCH == 0)


Here can just do #ifndef CUDA_ARCH

danpovey · 2020-08-30T09:46:25Z

k2/csrc/cuda/debug.h

+  #if !(defined(__clang__) && defined(__CUDA__))
+    #if (K2_PTX_ARCH == 0)
+      #define K2_DLOG(format, ...) printf(format,__VA_ARGS__)
+    #elif (K2_PTX_ARCH >= 200)


and here:

#if (defined(__CUDA_ARCH__ ) && (__CUDA_ARCH__ >= 200)

(the c preprocessor si OK with this). so no need for K2_PTX_ARCH. Keep it simple!

space after comma (I'm trying to follow Google C++ style generally.

danpovey · 2020-08-30T10:12:21Z

k2/csrc/cuda/debug.h

+ * @details If in NDEBUG mode, all macros here would do nothing.
+ */
+#if (!defined(NDEBUG) && !defined(K2_MAKE_ERROR_CHECK))
+#define K2_MAKE_ERROR_CHECK


..remove this..

- remove macro redefine guards - replace K2_MAKE_ERROR_CHECK with NDEBUG

megazone87 · 2020-08-30T11:26:40Z

According to the above discussion, I get a core principle, that to keep codes simple or minimal for usage. So I made some changes as above.

BTW, I think the comments are too lot. So I am going to delete much of them, especially note/remark that actually are general knowledge or wordy.

csukuangfj · 2020-08-30T14:41:11Z

Just want to make sure that any C++ code checked into K2 is of high quality.

If you have read the comment, you can find that a large part of them are about code style. Not sure whether you have
read the google c++ code style guide or have read the K2 source code. If you read your own code carefully, you can find
that even your own code has different styles, e.g., function names, function arguments, placement of const and * for variables, indentation of macros, etc. It is totally fine that you have your own style. But for an existing code base, I think
the code becomes less readable if it contains different styles.

I agree it's not quite readable.
as to add a CMake module doesn't break things, I would like to put it here for convenience.

To get things done is one thing, readability is also very important. You'll make the later maintainer cry.

I've read the code from https://github.com/XiaoMi/mace and I guess people from Xiaomi also care about code readability
and consistency.

csukuangfj · 2020-08-30T14:44:28Z

I would be grateful if someone can take his/her time to review my code.

as I tried with clang with cuda, and finally the previous error dismissed: "function not viable: requires 2 arguments, but 4 were provided extern int vprintf (const char *__restrict __format, _G_va_list __arg);"

csukuangfj · 2020-08-30T14:54:01Z

k2/csrc/cuda/debug_test.cu

@@ -56,7 +56,7 @@ __global__ void HelloCUDA(float f) {
 }

 // A vector add kernel definition
-__global__ void VecAdd(float* A, float* B, float* C) {
+__global__ void VecAdd(float *A, float *B, float *C) {


I find that you have deleted the comment about adding const for the two input arguments A and B.

The signature of this function tells us A and B will be changed inside the function,
but actually they are not. So it is really confusing for such a signature without const.

I would give a -1 vote.

Agreed, should use const where possible.
Also please add the extra checking macros like K2_CHECK_GE and so on.

I didn't delete things or IDK what you mean. The comment of issue or code? as to the issue it's still above.. As to the code, IDK what comments. Sorry.

danpovey · 2020-08-30T15:39:08Z

k2/csrc/cuda/debug.h

+ * @endcode
+ */
+#ifndef __CUDA_ARCH__
+  #define K2_DLOG(format, ...) printf(format,__VA_ARGS__)


space after comma

Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL.

megazone87 · 2020-08-30T18:07:49Z

I do read the K2 code and google style is always my favorite. I made style check as pre-commit/push requirement for one branch about K2 called cppyy in my fork. And I know code review is valuable. I appreciate it. And as many company code repos, the serious code cannot get pass w/o +2 vote after automatic style checking.

However, you must know my point: I don't feel the same standard is work here for everyone every commits. I don't find the master branch is all in right and consistent style, isn't it? And in the case that C-styles or Cpp-style, why I need to change to the other way? And even the static checker wouldn't remind the const thing for c-style. To be perfect, won't const & instead of const * be preferred for cpp-style? Thus, to be perfect, a lot of things need to change. I really appreciate the review work. I'm not a new-bee. In the case that I have explained I can't take advantage of automatic tools right now (b/c my IDE and be short of time). I really don't want to manually make it PERFECT in the current situation. As Dan knows, compare to the style thing, some implementation definitely with faults (e.g. the printf in _K2CudaDebug would delay cuda, but we need it for development), but to be simple and on-time, workable/minimal coding is running here as I observed.

However, I am definitely the guy pursues the clean code (I dare say I know the most tools that help), but with the least man effort and strong requirement from tools to get the goal, instead of all by man write and check. As I planed, I should already be working on cppyy and made it help us at this weekend. But I took much time to be a perfect coder.

I apology for my words. But I insist on tools instead of man should take these work. SORRY to be rude. You would know I am usually a kind man, just sometimes I know there is a better way then I cannot step the road underfoot.

megazone87 · 2020-08-30T18:52:29Z

Sorry to be some rube. @csukuangfj. Found I was kind of unsatisfying with myself's progress. Feel not right now, would hug you next meet. Let's look ahead. Please keep your way, you're right. Thx!!

csukuangfj · 2020-08-30T22:15:32Z

Not everything from NVIDIA is that right. I think you have the ability to change the demo code to your needs instead of just copying and pasting.

…

On Mon, Aug 31, 2020 at 2:18 AM Meixu Song ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In k2/csrc/cuda/debug_test.cu <https://github.com/danpovey/k2/pull/94#discussion_r479800081>: > @@ -56,7 +56,7 @@ __global__ void HelloCUDA(float f) { } // A vector add kernel definition -__global__ void VecAdd(float* A, float* B, float* C) { +__global__ void VecAdd(float *A, float *B, float *C) { I didn't delete things or IDK what you mean. The comment of issue or code, as to the issue it's still above.. As to the code, I post the page of doc: [image: image] <https://user-images.githubusercontent.com/705582/91666541-292d1780-eb30-11ea-9ecc-8209b7f2ad79.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/danpovey/k2/pull/94#discussion_r479800081>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABIKIPDFMYAAC2QSJ3WRXILSDKJW3ANCNFSM4QLPYQAA> .

danpovey · 2020-08-31T06:33:44Z

k2/csrc/cuda/debug.h

+ * @endcode
+ */
+#define K2_CUDA_CHECK_ERROR(e)                                    \
+  if (::k2::_K2CudaDebug((cudaError_t)(e), __FILE__, __LINE__)) { \


This will evaluate e twice! Not good!

qindazhu · 2020-08-31T07:29:07Z

k2/csrc/cuda/debug.h

+#include <device_launch_parameters.h>
+#include <driver_types.h>
+
+#include <stdio.h>


use cstdio?

danpovey · 2020-08-31T08:22:36Z

Thanks! Merging.

danpovey · 2020-08-31T08:23:17Z

Thanks! Merging.

qindazhu · 2020-08-31T08:23:35Z

k2/csrc/cuda/debug.h

+#include <driver_types.h>
+
+#include <stdio.h>
+#include <assert.h>


Is this header file from cuda? or from c++ library <cassert>? I know there's an assert function in cuda header file, but I don't know how nvcc decides which one to use.

danpovey · 2020-09-01T03:15:56Z

Meixu, I just noticed this is to the wrong branch!
Should be to cuda_draft :-(

@dan

* Cuda draft2 (#86) * Some code drafts * Updates to determinize draft.. * Progress on draft of CUDA stuff (sorry: lots of inconsistencies still) * Rework context/memory management for easier integration with external toolkits * Updates to Context * Add ragged.h * Add shape.h * Various fixes, added compose.cc * implement GetContext with parameter pack (#73) * Various fixes etc. * Update license; various updates on cuda stuff * Add utils header * Update the build system to support Cuda. (#75) * update the build system to support cuda. * add a google colab example for Cuda test. * enable CI for the cuda_draft branch. * resolve some comments. * Updates to compose algo. * Fix couple build issues * move shape to ragged_shape * More progress... * More progress... won't compile * More progress on CUDA draft (wont compile, sorry) * Working on composition... * matrix transpose in cuda (simple version) (#84) * more progress... * add performace test functions (#85) * various progress... * fix build issues (#87) Co-authored-by: Haowen Qiu <[email protected]> Co-authored-by: Fangjun Kuang <[email protected]> * cmake: set "Debug" as default, and add "-O0" for debugging (#83) * cmake: set "Debug" as default, and add "-O0" for debugging * make cmakelists.txt more pratical * change style of cmake custom variables * [WIP] CUDA error checking/debugging (#94) * [WIP] CUDA error checking/debugging add debug.cuh and related changes. * update #1 after review * update * update * trivials * replace get_filename_component, as it rises cmake version warning * add `K2_ASSERT, K2_CHECK_EQ, K2_PARANOID_ASSERT` and update for last review; make some trival changes * typo * trivals * fix a error * typo * Update debug.h * make last change consist with its comment * update to review * update to review * update as Dan required - remove macro redefine guards - replace K2_MAKE_ERROR_CHECK with NDEBUG * changes make gtest stop warn about usage of *_DEATH * remove utility macro about archtecture and the related file. * remove more complex less helpful cmake gtest module, and revert back * fix pointer pos style * continue to remove gtest util * rm special variadic printf for clang as I tried with clang with cuda, and finally the previous error dismissed: "function not viable: requires 2 arguments, but 4 were provided extern int vprintf (const char *__restrict __format, _G_va_list __arg);" * K2_CUDA_SAFE_CALL Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL. * fix to review * update to review - fix a twice error code checking issue - change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect - add a optional arg `abort` to control abort if hit error - trivals fix * just revert name: `K2_CUDA_CHECK_ERROR` * update to review (#103) * update to review - fix a twice error code checking issue - change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect - add a optional arg `abort` to control abort if hit error - trivals fix * just revert name: `K2_CUDA_CHECK_ERROR` * rm two files that @dan deleted on master Co-authored-by: Daniel Povey <[email protected]> Co-authored-by: Haowen Qiu <[email protected]> Co-authored-by: Fangjun Kuang <[email protected]>

[WIP] CUDA error checking/debugging

0e93db9

add debug.cuh and related changes.

qindazhu reviewed Aug 26, 2020

View reviewed changes

danpovey reviewed Aug 26, 2020

View reviewed changes

csukuangfj reviewed Aug 26, 2020

View reviewed changes

danpovey reviewed Aug 27, 2020

View reviewed changes

update #1 after review

2bee88a

megazone87 force-pushed the cuda_check branch from 130ac6f to 2bee88a Compare August 27, 2020 06:41

danpovey reviewed Aug 27, 2020

View reviewed changes

megazone87 added 2 commits August 27, 2020 14:48

update

43d522c

update

070d6f9

qindazhu reviewed Aug 27, 2020

View reviewed changes

trivials

42ea2ea

csukuangfj reviewed Aug 28, 2020

View reviewed changes

megazone87 added 2 commits August 28, 2020 16:36

replace get_filename_component, as it rises cmake version warning

e94837d

add K2_ASSERT, K2_CHECK_EQ, K2_PARANOID_ASSERT

372cfa2

and update for last review; make some trival changes

megazone87 added 2 commits August 30, 2020 17:05

update to review

01b05f6

update to review

34c72d6

danpovey reviewed Aug 30, 2020

View reviewed changes

megazone87 added 4 commits August 30, 2020 19:10

update as Dan required

cb2e222

- remove macro redefine guards - replace K2_MAKE_ERROR_CHECK with NDEBUG

changes make gtest stop warn about usage of *_DEATH

d7afa0e

remove utility macro about archtecture and the related file.

53079ca

remove more complex less helpful cmake gtest module, and revert back

def9a14

megazone87 added 2 commits August 30, 2020 19:50

fix pointer pos style

025f17b

continue to remove gtest util

55790fe

rm special variadic printf for clang

f450fd9

as I tried with clang with cuda, and finally the previous error dismissed: "function not viable: requires 2 arguments, but 4 were provided extern int vprintf (const char *__restrict __format, _G_va_list __arg);"

csukuangfj reviewed Aug 30, 2020

View reviewed changes

danpovey reviewed Aug 30, 2020

View reviewed changes

K2_CUDA_SAFE_CALL

be7c341

Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL.

fix to review

ddde0be

danpovey reviewed Aug 31, 2020

View reviewed changes

qindazhu reviewed Aug 31, 2020

View reviewed changes

danpovey merged commit 630fa7e into k2-fsa:master Aug 31, 2020

qindazhu reviewed Aug 31, 2020

View reviewed changes


		// See LICENSE for clarification regarding multiple authors

		#ifndef K2_CSRC_CUDA_ERRORS_H_

[WIP] CUDA error checking/debugging #94

[WIP] CUDA error checking/debugging #94

Conversation

megazone87 commented Aug 26, 2020

megazone87 commented Aug 26, 2020 • edited by csukuangfj Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megazone87 Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

megazone87 Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megazone87 Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

megazone87 Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Aug 26, 2020 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Aug 26, 2020 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qindazhu Aug 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Aug 28, 2020 via email

qindazhu commented Aug 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megazone87 commented Aug 26, 2020 •

edited by csukuangfj

Loading

megazone87 Aug 26, 2020 •

edited

Loading

megazone87 Aug 26, 2020 •

edited

Loading

csukuangfj Aug 26, 2020 •

edited

Loading

megazone87 Aug 26, 2020 •

edited

Loading

megazone87 Aug 28, 2020 •

edited

Loading

qindazhu Aug 27, 2020 •

edited

Loading

megazone87 commented Aug 30, 2020 •

edited

Loading