Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CUDA error checking/debugging #94

Merged
merged 24 commits into from
Aug 31, 2020
Merged

Conversation

megazone87
Copy link
Collaborator

add debug.cuh and related changes.

add debug.cuh and related changes.
@megazone87
Copy link
Collaborator Author

megazone87 commented Aug 26, 2020

Guys, it's WIP, but review/advice are welcome.

Todo:

  • write gtest/usages for this
  • replace current CUDA error checks (glog, not functional)

/**
* \brief A static assertion
* \param exp the compile-time boolean expression that must be true
* \param msg an error message if exp is false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment format is not consistent with our conventions, we use @param in other files and no \brief

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \brief is OK with me as it's likely more correct. I am using a kind of retarded doxygen that I don't attempt to compile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! I turn to learn about doxygen and will change to some suggesting conventions in the coming update.

#elif (K2_PTX_ARCH >= 200)
printf("CUDA error ID=%d, NAME=%s, [block (%d,%d,%d) thread (%d,%d,%d), %s, %d]: %s\n",
error, cudaGetErrorName(error),
blockIdx.z, blockIdx.y, blockIdx.x, threadIdx.z, threadIdx.y, threadIdx.x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's strange that the order is z, y, x instead of x, y, z. and can we add kernel name here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, where this function is supposed to be called? It seems it will be called with return value (cudaError_t error) from some cuda api, if it's not a kernel, how can we get threadIdx and blockIdx here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's strange that the order is z, y, x instead of x, y, z. and can we add kernel name here?

As the CUDA thread/block indexes are column-major distributed. I.e. they are virtual concepts for developers. For the indexing of them, if it's print like (A, B, C), then it implies [a][b][c] as a way to index them. Thus, according to the column-major, a=z, b=y, c=x. However, I can't found a convention to log them. So, either way is ok to me.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, where this function is supposed to be called? It seems it will be called with return value (cudaError_t error) from some cuda api, if it's not a kernel, how can we get threadIdx and blockIdx here?

Here, a cuda erro has raised. The error raises from either the host part or the kernel/device part of one cuda runtime-API or kernels.
The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give an example of how this would be called?
I want to see what kind of code it's intended for.
I see that there are examples in CUDA API's of device functions that return cudaError_t. But they're probably not common... if this is not intended to be used frequently, please clarify.

OK. I am writing gtests/examples.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the CUDA thread/block indexes are column-major distributed

IIRC,the index is row-major...anyway, I think (x,y,z) is more straightforward to most people, this is also used by most of cuda examples

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro K2_PTX_ARCH ? I did't find the related code in the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The K2_PTX_ARCH is used to solve this. During compiling, as to the host part, it's 0; as to the kernel part, it's not.

Sorry, I don't understand it clearly, Can you explain it with some code? I mean, How can you know it's host code or device code and annotate the code with macro K2_PTX_ARCH ? I did't find the related code in the PR.

It's defined in cuda/arch.h with comments.

int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int)));
// device call
{
FillContents<<<2, 3>>>(5, unallocated_array);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test run successfully? It's weird that it can access host memory in kernel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#endif
#endif

} // end namespace k2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to our conventions, there's no end, just } // namespace k2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend you installing clang-format to format your code automatically on save.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

enable_testing()

function(k2_add_gtest)
cmake_parse_arguments(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cmake code is very confusing, for me and I'm sure the majority of readers . Can you please add some kind of comment that's aimed at someone who doesn't really understand cmake? That explains what implications it has for usage, in terms of what flags you can pass or what behaviors it causes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe I can change another way to reuse the gtest function. But for now, as to add a CMake module doesn't break things, I would like to put it here for convenience.

@@ -0,0 +1,169 @@
// k2/csrc/cuda/errors.h

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is that we're just calling files .h even if they may contain cuda code, because due to the structure of the library almost every file can potentially instantiate cuda code even if it's not explicitly present. I know this may seem odd, but for me it's convenient as my emacs recognizes as C++, and I think it will make the files seem less scary to people.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/**
* \brief Macro for checking cuda error.
* If not cudaSuccess and K2_MAKE_ERROR_CHECK, print and exit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be more precise: "if K2_MAKE_ERROR_CHECK is defined and e is not cudaSuccess"... and clarify that it does nothing otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

endfunction()

# directly give the suite name source files,
function(k2_add_gtests test_set_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the following template
https://github.com/danpovey/k2/blob/2a6ba24b4a7845bab757fbf7044b0d6253bcb2d8/k2/csrc/CMakeLists.txt#L29-L41
not good enough that you want to make your own?

I find this cmake script hard to read and understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not quite readable. My motivation was to make this function reusable (and could keep gtest-compile out of make process, e.g. only compile and run if make test.). As simply put the above codes into a CMake module, it still requires the sources and dep-libs. If you will, please help make this function reusable?

#else
// A hack to implement the variadic printf as clang for other c-compilers,
// and sielence the warning
#pragma clang diagnostic ignored "-Wc++11-extensions"
Copy link
Collaborator

@csukuangfj csukuangfj Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user does not use clang?

What are the purposes of these two pragmas?

If they are really needed, please wrap them with pragma push and pop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake with the comments. This pragma is for clang, and you are right, a push and pop guard is added.

* https://github.com/NVlabs/cub/blob/c3cceac115c072fb63df1836ff46d8c60d9eb304/cub/util_debug.cuh#L109
*/
#if !defined(K2_DLOG)
#if !(defined(__clang__) && defined(__CUDA__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks weird.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 Please give more details. As to me, the auto-format of my IDE had aligned the macros but it made it less readable i think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I know what you mean. An indent error here is fixed now.

*
* \return The CUDA error.
*/
__host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function K2_CUDA_DEBUG is already inside the namespace k2, so there is
no need to put a k2 in it.

K2_CUDA_DEBUG should be a macro name, not a function name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K2_CUDA_DEBUG should be a macro name, not a function name

I agree with @csukuangfj here , it's better if we can change K2_CUDA_DEBUG to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__ and __LINE__. I suppose function K2_CUDA_DEBUG will not be called directly by users.

BTW, I'm in favour of macros with K2_ prefix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void CudaDebug(....) {....}

#define K2_CUDA_DEBUG(...) CudaDebug(...)

This is what I meant.

It's fine to have K2 in the macro, but I don't think it is needed in a function which is inside the namespace k2.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function K2_CUDA_DEBUG is already inside the namespace k2, so there is
no need to put a k2 in it.

I am not very certain. But if it gets called out of the k2 namespace.. as it's an inline function.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K2_CUDA_DEBUG should be a macro name, not a function name

I agree with @csukuangfj here , it's better if we can change K2_CUDA_DEBUG to follow name convention for functions (as it's a function instead of macro), e.g. CudaDebug, as you have defined macros below (e.g. K2_CUDA_CHECK_ERROR) to provide __FILE__ and __LINE__. I suppose function K2_CUDA_DEBUG will not be called directly by users.

BTW, I'm in favour of macros with K2_ prefix.

The inline function name changed from "K2_CUDA_DEBUG" to "_K2DebugLog", "_" indicates it's an inner func.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functions in the namespace k2, except this one, do not contain k2, so what is the reason for adding k2 to this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functions in the namespace k2, except this one, do not contain k2, so what is the reason for adding k2 to this function?

I know what you mean, but you may not try to get my point. For example:

#include <iostream>

namespace k2 {
  inline void Print() {
    std::cout << "Hello World" << std::endl;
  }

#define PP Print()
}

int main () {
  PP;
  return 0;
}

compile and get error:

note: 'k2::Print' declared here
  inline void Print() {
              ^
1 error generated. 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use

#define PP k2::Print()

The compiler gives a hint that you can use k2::Print in the macro.

@danpovey
Copy link
Collaborator

danpovey commented Aug 26, 2020 via email


// See LICENSE for clarification regarding multiple authors

#ifndef K2_CSRC_CUDA_ERRORS_H_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the cpplint script in the scripts folder before committing.
It can help you spot the incorrect header guard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I change it manually this time. And will find a way to make it take effect for my IDE.

@danpovey
Copy link
Collaborator

danpovey commented Aug 26, 2020 via email

/**
* @def K2_DLOG
*
* @brief Log macro for printf statements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify how it is supposed to be used with an example!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please see the gtest about it. Though I'm afraid �it's not as good as expected, I need to know the exactly expected usages so I can make changes accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songmeixu please make sure there is an example in the documentation! People won't know to look in the test file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, the usage example is put @code tag as doxygen suggests.

int *unallocated_array = static_cast<int *>(malloc(30 * sizeof(int)));
// device call K2_DLOG
{
helloCUDA<<<1, 5>>>(1.2345f);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/* print out 100 evenly spaced square roots just to see that it worked */
unsigned int i;
for (i = 0; i < N; i += (N / 100)) {
K2_DLOG("sqrt(%d) = %lf\n", i, roots[i]);
Copy link
Collaborator

@qindazhu qindazhu Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something just like assert(sqrt(i) == roots[i]) is better than print values here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtest EXPECT_EQ is now used to check this.

{
HelloCUDA<<<1, 5>>>(1.2345f);
FillContents<<<2, 3>>>(5, unallocated_array);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free(unallocated_array)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use std::array or std::vector? They manage the memory for you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this uses unified memory... is this on by default, and does it impose a constraint on compute capability?
I had no plans to make use of unified memory, actually. but OK for testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's not unified memory, unified memory should be allocated with cudaMallocManage. BTW, I think we should not use unified memory even in test code, as for our code, all memory should be allocated by using Context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is to use std::array or std::vector to replace malloc. If it needs to be accessed by GPU,
we can use cudaMalloc and cudaMemcpy to make it available to GPU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free(unallocated_array)

Fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is to use std::array or std::vector to replace malloc. If it needs to be accessed by GPU,
we can use cudaMalloc and cudaMemcpy to make it available to GPU.

I see. As the device code like C-style data argument, and it's not an important thing.. So, I add the missing free and keep C-style here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether style is an important thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 That what I want to say. The thing is you wanna me change to the other style... and you know it's not an important thing.

@danpovey
Copy link
Collaborator

danpovey commented Aug 28, 2020 via email

@qindazhu
Copy link
Collaborator

So how does the CUDA code successfully write to the CPU memory?

As I comment above, I think the current code can not be run successfully as kernel tries to access host memory directly, I guess this is only WIP test code. Maybe @songmeixu is the best one to answer this.

@@ -176,7 +176,7 @@ __host__ __device__ __forceinline__ cudaError_t K2_CUDA_DEBUG(
*
* @remark
* macro `__VA_ARGS__` is used to pass the kernel<<<...>>> as one argument,
* otherwise the compiler rise a error "passed 2 arguments, but takes just 1".
* otherwise the compiler rises a error "passed 2 arguments, but takes just 1".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raises an error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
#if !defined(K2_DLOG)
#if !(defined(__clang__) && defined(__CUDA__))
#if (K2_PTX_ARCH == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here can just do #ifndef CUDA_ARCH

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#if !(defined(__clang__) && defined(__CUDA__))
#if (K2_PTX_ARCH == 0)
#define K2_DLOG(format, ...) printf(format,__VA_ARGS__)
#elif (K2_PTX_ARCH >= 200)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here:

 #if (defined(__CUDA_ARCH__ ) && (__CUDA_ARCH__ >= 200)

(the c preprocessor si OK with this). so no need for K2_PTX_ARCH. Keep it simple!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after comma (I'm trying to follow Google C++ style generally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @details If in NDEBUG mode, all macros here would do nothing.
*/
#if (!defined(NDEBUG) && !defined(K2_MAKE_ERROR_CHECK))
#define K2_MAKE_ERROR_CHECK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..remove this..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@megazone87
Copy link
Collaborator Author

megazone87 commented Aug 30, 2020

According to the above discussion, I get a core principle, that to keep codes simple or minimal for usage. So I made some changes as above.


BTW, I think the comments are too lot. So I am going to delete much of them, especially note/remark that actually are general knowledge or wordy.

@csukuangfj
Copy link
Collaborator

Just want to make sure that any C++ code checked into K2 is of high quality.

If you have read the comment, you can find that a large part of them are about code style. Not sure whether you have
read the google c++ code style guide or have read the K2 source code. If you read your own code carefully, you can find
that even your own code has different styles, e.g., function names, function arguments, placement of const and * for variables, indentation of macros, etc. It is totally fine that you have your own style. But for an existing code base, I think
the code becomes less readable if it contains different styles.

I agree it's not quite readable.
as to add a CMake module doesn't break things, I would like to put it here for convenience.

To get things done is one thing, readability is also very important. You'll make the later maintainer cry.

I've read the code from https://github.com/XiaoMi/mace and I guess people from Xiaomi also care about code readability
and consistency.

@csukuangfj
Copy link
Collaborator

I would be grateful if someone can take his/her time to review my code.

as I tried with clang with cuda, and finally the previous error dismissed: 
"function not viable: requires 2 arguments, but 4 were provided                            
extern int vprintf (const char *__restrict __format, _G_va_list __arg);"
@@ -56,7 +56,7 @@ __global__ void HelloCUDA(float f) {
}

// A vector add kernel definition
__global__ void VecAdd(float* A, float* B, float* C) {
__global__ void VecAdd(float *A, float *B, float *C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that you have deleted the comment about adding const for the two input arguments A and B.

The signature of this function tells us A and B will be changed inside the function,
but actually they are not. So it is really confusing for such a signature without const.

I would give a -1 vote.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, should use const where possible.
Also please add the extra checking macros like K2_CHECK_GE and so on.

Copy link
Collaborator Author

@megazone87 megazone87 Aug 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't delete things or IDK what you mean. The comment of issue or code? as to the issue it's still above.. As to the code, IDK what comments. Sorry.

* @endcode
*/
#ifndef __CUDA_ARCH__
#define K2_DLOG(format, ...) printf(format,__VA_ARGS__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after comma

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL.
@megazone87
Copy link
Collaborator Author

I do read the K2 code and google style is always my favorite. I made style check as pre-commit/push requirement for one branch about K2 called cppyy in my fork. And I know code review is valuable. I appreciate it. And as many company code repos, the serious code cannot get pass w/o +2 vote after automatic style checking.


However, you must know my point: I don't feel the same standard is work here for everyone every commits. I don't find the master branch is all in right and consistent style, isn't it? And in the case that C-styles or Cpp-style, why I need to change to the other way? And even the static checker wouldn't remind the const thing for c-style. To be perfect, won't const & instead of const * be preferred for cpp-style? Thus, to be perfect, a lot of things need to change. I really appreciate the review work. I'm not a new-bee. In the case that I have explained I can't take advantage of automatic tools right now (b/c my IDE and be short of time). I really don't want to manually make it PERFECT in the current situation. As Dan knows, compare to the style thing, some implementation definitely with faults (e.g. the printf in _K2CudaDebug would delay cuda, but we need it for development), but to be simple and on-time, workable/minimal coding is running here as I observed.


However, I am definitely the guy pursues the clean code (I dare say I know the most tools that help), but with the least man effort and strong requirement from tools to get the goal, instead of all by man write and check. As I planed, I should already be working on cppyy and made it help us at this weekend. But I took much time to be a perfect coder.


I apology for my words. But I insist on tools instead of man should take these work. SORRY to be rude. You would know I am usually a kind man, just sometimes I know there is a better way then I cannot step the road underfoot.

@megazone87
Copy link
Collaborator Author

Sorry to be some rube. @csukuangfj. Found I was kind of unsatisfying with myself's progress. Feel not right now, would hug you next meet. Let's look ahead. Please keep your way, you're right. Thx!!

@csukuangfj
Copy link
Collaborator

csukuangfj commented Aug 30, 2020 via email

* @endcode
*/
#define K2_CUDA_CHECK_ERROR(e) \
if (::k2::_K2CudaDebug((cudaError_t)(e), __FILE__, __LINE__)) { \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will evaluate e twice! Not good!

#include <device_launch_parameters.h>
#include <driver_types.h>

#include <stdio.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cstdio?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@danpovey
Copy link
Collaborator

Thanks! Merging.

1 similar comment
@danpovey
Copy link
Collaborator

Thanks! Merging.

@danpovey danpovey merged commit 630fa7e into k2-fsa:master Aug 31, 2020
#include <driver_types.h>

#include <stdio.h>
#include <assert.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this header file from cuda? or from c++ library <cassert>? I know there's an assert function in cuda header file, but I don't know how nvcc decides which one to use.

@danpovey
Copy link
Collaborator

danpovey commented Sep 1, 2020

Meixu, I just noticed this is to the wrong branch!
Should be to cuda_draft :-(

danpovey added a commit that referenced this pull request Sep 1, 2020
* Cuda draft2 (#86)

* Some code drafts

* Updates to determinize draft..

* Progress on draft of CUDA stuff (sorry: lots of inconsistencies still)

* Rework context/memory management for easier integration with external toolkits

* Updates to Context

* Add ragged.h

* Add shape.h

* Various fixes, added compose.cc

* implement GetContext with parameter pack (#73)

* Various fixes etc.

* Update license; various updates on cuda stuff

* Add utils header

* Update the build system to support Cuda. (#75)

* update the build system to support cuda.

* add a google colab example for Cuda test.

* enable CI for the cuda_draft branch.

* resolve some comments.

* Updates to compose algo.

* Fix couple build issues

* move shape to ragged_shape

* More progress...

* More progress... won't compile

* More progress on CUDA draft (wont compile, sorry)

* Working on composition...

* matrix transpose in cuda (simple version) (#84)

* more progress...

* add performace test functions (#85)

* various progress...

* fix build issues (#87)

Co-authored-by: Haowen Qiu <[email protected]>
Co-authored-by: Fangjun Kuang <[email protected]>

* cmake: set "Debug" as default, and add "-O0" for debugging (#83)

* cmake: set "Debug" as default, and add "-O0" for debugging

* make cmakelists.txt more pratical

* change style of cmake custom variables

* [WIP] CUDA error checking/debugging (#94)

* [WIP] CUDA error checking/debugging

add debug.cuh and related changes.

* update #1 after review

* update

* update

* trivials

* replace get_filename_component, as it rises cmake version warning

* add `K2_ASSERT, K2_CHECK_EQ, K2_PARANOID_ASSERT` 

and update for last review; make some trival changes

* typo

* trivals

* fix a error

* typo

* Update debug.h

* make last change consist with its comment

* update to review

* update to review

* update as Dan required

- remove macro redefine guards
- replace K2_MAKE_ERROR_CHECK with NDEBUG

* changes make gtest stop warn about usage of *_DEATH

* remove utility macro about archtecture  and the related file.

* remove more complex less helpful cmake gtest module, and revert back

* fix pointer pos style

* continue to remove gtest util

* rm special variadic printf for clang 

as I tried with clang with cuda, and finally the previous error dismissed: 
"function not viable: requires 2 arguments, but 4 were provided                            
extern int vprintf (const char *__restrict __format, _G_va_list __arg);"

* K2_CUDA_SAFE_CALL

Never distinguish API or KERNEl call, instead merge into one: K2_CUDA_SAFE_CALL.

* fix to review

* update to review

- fix a twice error code checking issue
- change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect
- add a optional arg `abort` to control abort if hit error
- trivals
fix

* just revert name: `K2_CUDA_CHECK_ERROR`

* update to review (#103)

* update to review

- fix a twice error code checking issue
- change macro name `K2_CUDA_CHECK_ERROR` to `K2_CUDA_PRINT_ERROR` as its effect
- add a optional arg `abort` to control abort if hit error
- trivals
fix

* just revert name: `K2_CUDA_CHECK_ERROR`

* rm two files that @dan deleted on master

Co-authored-by: Daniel Povey <[email protected]>
Co-authored-by: Haowen Qiu <[email protected]>
Co-authored-by: Fangjun Kuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants