-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize the use of simd intrinsic and implement scalar kernels. #2299
Conversation
2. Centralize the use of sse and neon instrinsic. 3. Disable neon intrinsic when enable gpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外,看一下能不能把hl_cpu_scalar.cuh, hl_cpu_simd_sse.cuh和hl_cpu_simd_neon.cuh里面的add,
mul,sub,div这几个基本操作实现到hl_tensor_ops.h里面去。
#elif defined(__SSE3__) | ||
#include "hl_cpu_simd_sse.cuh" | ||
#elif (defined(__ARM_NEON) || defined(__ARM_NEON__)) && !defined(__NVCC__) | ||
#include "hl_cpu_simd_neon.cuh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里注释说明一下加入__NVCC__宏的原因。
另外,ARM+GPU环境下,ARM部分用不了neon指令。应该加一个TODO,后续还是需要fix这个问题的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
另外,其实Paddle里面还有好多用avx实现的kernel,目前没有neon实现,我做这一层对基本操作封装,也是希望日后可以用封装的接口代替直接调用的
还有一个问题,当前这种实现方式没有考虑到运行时动态选择指令集的实现。是不是不要统一命名的好? |
不是三个版本,是实现一个版本。hl_tensor_ops.cuh里面定义的是模板类,可以实例化不同的参数类型。
只是没有实现而已。
这个有什么关系吗?不统一命名,调用的时候用if else区分? |
我明白了。
我明白了。没有关系了,统一命名,可以以类型区分。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先这样吧。
后续vecType需要修改下面这样的模板类,这样可以去掉PADDLE_TYPE_DOUBLE宏,并且可以扩展到int等类型。
template<class T, int size>
Packet;
Major modifications are listed as follows:
hl_cpu_simd_sse.cuh
andhl_cpu_simd_neon.cuh
.hl_cpu_scalar.cuh
.As a result, we do not need
hl_matrix_base_[sse/neon].cuh
,hl_[sse/neon]_matrix_kernel.cuh
any more, which are almost the same.