-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCM] update fluid platform for rocm35 (part1), test=develop #30639
Conversation
Thanks for your contribution! |
38f330b
to
b708405
Compare
2a5a3e6
to
50b5659
Compare
50b5659
to
6710055
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PADDLE_ENFORCE change
|
||
inline const char* rocblasGetErrorString(rocblas_status stat) { | ||
switch (stat) { | ||
case rocblas_status_invalid_handle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户反馈我们第三方库报错只有一个status,没说具体原因,然后在搜索引擎又不能比较快的找到官网解释的话,用户体验会比较差,这块 @zhouwei25 后续还会做一些增强,可以关注下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前可以先简写一版报错,后面如果官网有报错支持,后续可能统一把AMD这几种也压缩到cudaerrormessage.pb里,这个文件目前仅集成了NvidiaGPU的报错内容
paddle/fluid/platform/enforce.h
Outdated
return webstr.str(); | ||
} | ||
|
||
inline std::string build_nvidia_error_msg(hipError_t e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是Nvidia 5种类型的报错统一接口,是将官网信息映射为 报错码+报错内容 的形式压缩到一个cudaerrormessage.pb的文件里去,AMD GPU的报错信息可以叫build_amd_error_msg,现在那个cudaerrormessage.pb只有Nvidia的部分,没有AMD的,可以先不走这块查询逻辑,因为肯定查不到。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成build_rocm_error_msg
paddle/fluid/platform/enforce.h
Outdated
/***** HIP ERROR *****/ | ||
inline bool is_error(hipError_t e) { return e != hipSuccess; } | ||
|
||
inline std::string GetCudaErrorWebsite(int32_t cuda_version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个也可以先不写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去掉了GetCudaErrorWebsite
int32_t cuda_version = -1; | ||
#endif | ||
std::ostringstream sout; | ||
sout << " Hip error(" << e << "), " << hipGetErrorString(e) << "."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先打出hipGetErrorString(e)这部分,后面的逻辑目前无法触发可以先不用写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除了hipGetErrorString(e)之后的error string的逻辑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
Update paddle fluid platform for rocm35 - part1