-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[amd] convolution kernel didn't reuse the algorithm founded. #11203
Comments
The result I0605 10:56:09.376890 34846 tensor_util.cu:22] TensorCopy 3 from CUDAPlace(0) to CPUPlace
I0605 10:56:09.376969 34846 conv_cudnn_op.cu.cc:151] Find Kernel: load 0x7f0044659080 kernel :3 |
在backward op里,是可以利用前向op的所有input, output的。需要定制一下GradOpMaker (Python端用来创建OpDesc). class Conv2DGradMaker : public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
protected:
std::unique_ptr<framework::OpDesc> Apply() const override {
auto* op = new framework::OpDesc();
op->SetType("conv2d_grad");
op->SetInput("Input", Input("Input"));
op->SetInput("Filter", Input("Filter"));
op->SetInput("Algorithm", Input("Algorithm"));
op->SetInput(framework::GradVarName("Output"), OutputGrad("Output"));
op->SetAttrMap(Attrs());
op->SetOutput("AlgorithmOut", Output("AlgorithmOut"));
op->SetOutput(framework::GradVarName("Input"), InputGrad("Input"));
op->SetOutput(framework::GradVarName("Filter"), InputGrad("Filter"));
return std::unique_ptr<framework::OpDesc>(op);
}
}; |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
My PR fix the issue above https://github.com/dzhwinter/Paddle/tree/review_conv2d_1
The cudnn op is run on Cuda device, so its inputs/outputs must stay at Cuda device. In ROCm#16, it use CPU Tensor to store the algorithm selected, but our framework will automatically transform it into a temporary GPU Tensor. As a result, inside cudnn op, it can not get the real persistent Tensor.
If we allocated output and input in GPU, and copy the result to CPU, then we will get the correct result.
The text was updated successfully, but these errors were encountered: