Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement DeviceContext #2709

Merged

Conversation

QiJune
Copy link
Member

@QiJune QiJune commented Jul 3, 2017

Eigen::DefaultDevice* eigen_device_{nullptr};
};

#ifndef PADDLE_ONLY_CPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here to add PADDLE_ONLY_CPU will bring a problem, the code that calls DeviceGuard or CudaDeviceContext needs to be separated by PADDLE_ONLY_CPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code that calls DeviceGuard or CudaDeviceContext must have WITH_GPU set 1.
Yes, this brings a question, how we organize our CPU/GPU codes clearly.
We can use marco, or make fake stub header file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few things to consider when dealing with GPU and CPU mixed code.

  1. If not necessary, try not to put the GPU and CPU code in a file. In this way, you do not need to use an extra macro to separate the code. (I think, context.h can only contain cpu context, cuda_context.h can contain gpu context.)
  2. Do not use PADDLE_ONLY_CPU, should be replaced by PADDLE_WITH_CUDA. The default should be the CPU code, and when need to use CUDA code, add PADDLE_WITH_CUDA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge this pr temporarily. And I will consider the design of DeviceContext combining with Operator interface. And I will follow advices of @hedaoyuan later.

Eigen::DefaultDevice eigen_device() {
if (!eigen_device_) {
eigen_device_ = new Eigen::DefaultDevice();
}
Copy link
Contributor

@qingqing01 qingqing01 Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where use the Eigen::DefaultDevice in our design? I find the Eigen::DefaultDevice in the directory of eigen/unsupported/Eigen/CXX11/src/Tensor, but I do not find the usage in the Tensor's doc of Eigen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eigen::DefaultDevice is defined in unsupported/Eigen/CXX11/src/Tensor/TensorDeviceDefault.h.
About the usage of Eigen::DefaultDevice, please refer to (https://github.com/QiJune/RefEigen/blob/master/main.cu)

paddle::platform::throw_on_error(cudaStreamCreate(&stream_),
"cudaStreamCreate failed");
eigen_stream_ = new Eigen::CudaStreamDevice(&stream_);
eigen_device_ = new Eigen::GpuDevice(eigen_stream_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we will use the CUDA implementation in Eigen. If not decide to use it, I think the eigen_stream_ and eigen_device_ can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not use CUDA implementation in Eigen, then we will write CUDA kernels for every operators. Just like caffe2.
And tensorflow use CUDA implementation in Eigen. @hedaoyuan once mentioned the efficiency of expression template of Eigen in GPU is acceptable.
So, we may have a discussion about this offline.

#include "paddle/framework/enforce.h"
#include "paddle/platform/dynload/cublas.h"
#include "paddle/platform/dynload/cudnn.h"
#include "paddle/platform/dynload/curand.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above three lines should also be included between #ifndef PADDLE_ONLY_CPU and #endif.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, logically above three header files should be included between macros.


#ifndef PADDLE_ONLY_CPU
#include "paddle/platform/cuda.h"
#define EIGEN_USE_GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the EIGEN_USE_GPU used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EIGEN_USE_GPU is used by eigen library. If we want to use Tensor Expression of eigen in GPU, we have to define this marco.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea of a super simple engine.

virtual ~DeviceContext() {}
};

class CpuDeviceContext : public DeviceContext {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cpu => CPU

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GPUPlace previous_;
};

class CudaDeviceContext : public DeviceContext {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cuda => CUDA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@QiJune QiJune force-pushed the feature/implement_DeviceContext branch from baed7b1 to 39679d5 Compare July 6, 2017 06:32
class CPUDeviceContext : public DeviceContext {};

#ifndef PADDLE_ONLY_CPU
class DeviceGuard {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems GPUPlaceGuard? not DeviceGuard, because it takes GPUPlace as argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually guard the GPUPlace.device. Since we pass GPUPlace, maybe GPUPlaceGuard is a more clear name

"cudaStreamSynchronize failed");
}

cudaStream_t stream() { return stream_; }
Copy link
Collaborator

@reyoung reyoung Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lake of const for all methods?

But maybe it is not important because all device context is a mutable pointer passed to Op::Run.

cublasHandle_t cublas_handle() {
if (!blas_handle_) {
DeviceGuard guard(gpu_place_);
PADDLE_ENFORCE(paddle::platform::dynload::cublasCreate(&blas_handle_) ==
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tooooooooo long for the namespace.

Maybe we can add using namespace paddle::platform; in this class private section, like

class GPUDeviceContext {
private:
  using namespace paddle::platform;   // only use namespace in this class.
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe alias is better, like

using dynload = paddle::platform::dynload;

Copy link
Member Author

@QiJune QiJune Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we cannot add an alias or using namespace inside a class

for (int i = 0; i < count; i++) {
paddle::platform::CUDADeviceContext* device_context =
new paddle::platform::CUDADeviceContext(i);
__attribute__((unused)) Eigen::GpuDevice gpu_device =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use unused attribute, because it may fail on some compiler.

Maybe the return value does not need to store. What about

ASSERT_NE(nullptr, device_context->eigen_device());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@QiJune QiJune force-pushed the feature/implement_DeviceContext branch from e3f0db4 to c7bdbdb Compare July 6, 2017 09:43
Copy link
Member

@jacquesqiao jacquesqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can seperate cpu code and gpu code later.

@jacquesqiao jacquesqiao merged commit 1038bc4 into PaddlePaddle:develop Jul 10, 2017
@QiJune QiJune deleted the feature/implement_DeviceContext branch July 10, 2017 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants