Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device Abstraction #610

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
bc909c6
Wrap the CPU and GPU math functions in math backend classes
kloudkl May 14, 2014
93bee80
Add the math backend to the Layer base class
kloudkl May 14, 2014
b9621c6
Add device type independent getters to Blob
kloudkl May 14, 2014
8697a72
Remove tab from the code and reformat using google style
kloudkl May 15, 2014
6df6109
Allow Layer::Forward and Backward to be overridden
kloudkl May 15, 2014
2883a62
Use zero as the default return values of Blob data and diff methods
kloudkl May 15, 2014
ed71d59
Add and test device type ignorant Forward and Backward in ConcatLayer
kloudkl May 15, 2014
7b1733c
Add default implementations of Layer::Forward_cpu and Backward_cpu
kloudkl May 15, 2014
0f37050
Directly implement device neutral Forward and Backward in ConcatLayer
kloudkl May 15, 2014
13f2f57
Generalize the math backend classes into device wrapper classes
kloudkl May 24, 2014
79325f3
Add Device::copy_from_cpu for the data layers
kloudkl May 25, 2014
67d5621
Unify the CPU and the GPU Forward of the DataLayer
kloudkl May 25, 2014
e57a703
Unify the CPU and the GPU Forward of the ImageDataLayer
kloudkl May 25, 2014
aed27b0
Unify the CPU and the GPU Forward of the HDF5DataLayer
kloudkl May 25, 2014
0703d7b
Unify the CPU and the GPU Forward & Backward of the HDF5OutputDataLayer
kloudkl May 25, 2014
ed7582b
Merge the CPU and the GPU Backward of the data layers
kloudkl May 25, 2014
9b01ff1
Consolidate the CPU and GPU Forward of the WindowDataLayer
kloudkl May 25, 2014
063deba
Deduplicate the CPU and the GPU Forward & Backward of the FlattenLayer
kloudkl May 25, 2014
b945f63
Use the newly implemented caffe_gpu_{add,sub} in the GPU device wrapper
kloudkl May 26, 2014
b8c0ec1
Unify the CPU/GPU Forward/Backward of the SigmoidCrossEntropyLossLayer
kloudkl May 26, 2014
75a71e7
Merge the CPU/GPU Forward/Backward of the SoftmaxWithLossLayer
kloudkl May 26, 2014
df4c2e5
Use {const, mutable}_{data, diff} in the unified Forward/Backward
kloudkl May 26, 2014
716c29d
Unify the CPU/GPU Forward/Backward of the InnerProductLayer
kloudkl May 26, 2014
f14911e
Unify the CPU/GPU Forward/Backward of the SplitLayer
kloudkl May 26, 2014
faef386
Unify the CPU/GPU Forward/Backward of the EltwiseLayer
kloudkl May 26, 2014
740820b
Add im2col and col2im to wrap im2col_{cpu, gpu} and col2im_{cpu, gpu}
kloudkl May 26, 2014
0275bdf
Unify the CPU/GPU versions of Forward/Backward of the ConvolutionLayer
kloudkl May 26, 2014
f70db65
Unify the CPU/GPU versions of Forward/Backward of the Im2colLayer
kloudkl May 26, 2014
ae17c17
Unify the CPU/GPU versions of Forward/Backward of the PowerLayer
kloudkl May 26, 2014
4890b55
Move im2col and col2im into the device wrapper classes
kloudkl May 26, 2014
31fb4b9
Update the include guard of the util/device.hpp
kloudkl Jun 8, 2014
56e23fa
Device wrapper methods no longer pure virtual, default not implemented
kloudkl Jun 18, 2014
d7230b9
Fix the rebase errors introduced when merge conflicts are resolved
kloudkl Jun 18, 2014
6618265
Add current-device data accessors to SyncedMem. Use to fix Convoluti…
Jun 25, 2014
d36ed3b
Fix using wrong device data. Fix tests. All tests pass on a CUDA box.
Jun 26, 2014
99380c9
Fix rebase errors.
Jun 28, 2014
d45100f
Incorporate additional abstractions from 1605431.
Jul 2, 2014
ff9bff0
Move device.hpp up out of util/.
Jul 2, 2014
96ed3c2
Make device factory usage more inline with layer factory.
Jul 2, 2014
e12aa5b
Move devices into their own src dir.
Jul 2, 2014
6db9c8c
Incorporate all cpu math functions directly into CPUDevice.
Jul 7, 2014
1393aa4
Fix compilation errors.
Jul 8, 2014
72b0af1
Fix lint errors.
Jul 8, 2014
9420cea
Fix formatting.
Jul 9, 2014
bddcae8
Try to obey google style guide.
Jul 9, 2014
f721ac6
Replace all `caffe_*()` cpu functions with `CPUDevice` functions.
Jul 12, 2014
8fb944c
Get rid of all `caffe_*` cpu functions.
Jul 14, 2014
af5e292
Move caffe_rng_rand() into util/rng.hpp.
Jul 14, 2014
e1c66dc
Implement all non-kernel gpu functions in GPUDevice and replace corre…
Jul 15, 2014
c125a97
Implement all kernel device functions under GPUDevice. Get rid of unu…
Jul 15, 2014
8d111e4
Implement sqr() and exp() for GPUDevice.
Jul 15, 2014
39c8658
Separate device.hpp into device-specific headers.
Jul 16, 2014
e68a140
Use UVA for GPU copy. Get rid of copy_from_cpu().
Jul 24, 2014
e523b94
Explicitly instantiate/specialize int and unsigned int templates for …
Jul 25, 2014
8bff9ce
Fix sign kernel for unsigned int.
Jul 25, 2014
9a5b6d2
Fix all rebase errors. All tests are passing.
Jul 25, 2014
72e4c21
Remove GPU stubs for all layers with device-unified Forward() and Bac…
Jul 25, 2014
95bf45f
include math.h in filler and solver -- seems to be needed for pow and
jeffdonahue Jul 27, 2014
afd7bf5
Add "#ifndef CPU_ONLY" guards around GPU device code so that CPU_ONLY
jeffdonahue Jul 27, 2014
2e23318
Save warnings when building device files.
jeffdonahue Jul 27, 2014
e30ba4d
Fix post-rebase errors.
jeffdonahue Jul 27, 2014
d52d669
Fix post-rebase errors, including:
jeffdonahue Aug 12, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ PROTO_OBJS := ${PROTO_GEN_CC:.cc=.o}
OBJ_BUILD_DIR := $(BUILD_DIR)/src/$(PROJECT)
LAYER_BUILD_DIR := $(OBJ_BUILD_DIR)/layers
UTIL_BUILD_DIR := $(OBJ_BUILD_DIR)/util
DEVICE_BUILD_DIR := $(OBJ_BUILD_DIR)/devices
OBJS := $(PROTO_OBJS) $(CXX_OBJS) $(CU_OBJS)
# tool, example, and test objects
TOOL_OBJS := $(addprefix $(BUILD_DIR)/, ${TOOL_SRCS:.cpp=.o})
Expand Down Expand Up @@ -179,7 +180,8 @@ endif

ALL_BUILD_DIRS := $(sort \
$(BUILD_DIR) $(LIB_BUILD_DIR) $(OBJ_BUILD_DIR) \
$(LAYER_BUILD_DIR) $(UTIL_BUILD_DIR) $(TOOL_BUILD_DIR) \
$(LAYER_BUILD_DIR) $(UTIL_BUILD_DIR) $(DEVICE_BUILD_DIR) \
$(TOOL_BUILD_DIR) \
$(TEST_BUILD_DIR) $(TEST_BIN_DIR) $(GTEST_BUILD_DIR) \
$(EXAMPLE_BUILD_DIRS) \
$(LINT_OUTPUT_DIR) \
Expand Down Expand Up @@ -457,6 +459,12 @@ $(LAYER_BUILD_DIR)/%.o: src/$(PROJECT)/layers/%.cpp $(HXX_SRCS) \
@ cat $@.$(WARNS_EXT)
@ echo

$(DEVICE_BUILD_DIR)/%.o: src/$(PROJECT)/devices/%.cpp $(HXX_SRCS) \
| $(DEVICE_BUILD_DIR)
$(CXX) $< $(CXXFLAGS) -c -o $@ 2> $@.$(WARNS_EXT) \
|| (cat $@.$(WARNS_EXT); exit 1)
@ echo

$(PROTO_BUILD_DIR)/%.pb.o: $(PROTO_BUILD_DIR)/%.pb.cc $(PROTO_GEN_HEADER) \
| $(PROTO_BUILD_DIR)
$(CXX) $< $(CXXFLAGS) -c -o $@ 2> $@.$(WARNS_EXT) \
Expand All @@ -483,6 +491,12 @@ $(LAYER_BUILD_DIR)/%.cuo: src/$(PROJECT)/layers/%.cu $(HXX_SRCS) \
@ cat $@.$(WARNS_EXT)
@ echo

$(DEVICE_BUILD_DIR)/%.cuo: src/$(PROJECT)/devices/%.cu $(HXX_SRCS) \
| $(DEVICE_BUILD_DIR)
$(CUDA_DIR)/bin/nvcc $(NVCCFLAGS) $(CUDA_ARCH) -c $< -o $@ 2> $@.$(WARNS_EXT) \
|| (cat $@.$(WARNS_EXT); exit 1)
@ echo

$(UTIL_BUILD_DIR)/%.cuo: src/$(PROJECT)/util/%.cu | $(UTIL_BUILD_DIR)
$(CUDA_DIR)/bin/nvcc $(NVCCFLAGS) $(CUDA_ARCH) -c $< -o $@ 2> $@.$(WARNS_EXT) \
|| (cat $@.$(WARNS_EXT); exit 1)
Expand Down
7 changes: 6 additions & 1 deletion include/caffe/blob.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/syncedmem.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

Expand Down Expand Up @@ -70,6 +69,12 @@ class Blob {
Dtype* mutable_gpu_data();
Dtype* mutable_cpu_diff();
Dtype* mutable_gpu_diff();

const Dtype* const_data() const;
const Dtype* const_diff() const;
Dtype* mutable_data();
Dtype* mutable_diff();

void Update();
void FromProto(const BlobProto& proto);
void ToProto(BlobProto* proto, bool write_diff = false) const;
Expand Down
2 changes: 1 addition & 1 deletion include/caffe/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ class Caffe {
}
return *singleton_;
}
enum Brew { CPU, GPU };
enum Brew { UNSPECIFIED = -1, CPU, GPU };
enum Phase { TRAIN, TEST };


Expand Down
60 changes: 20 additions & 40 deletions include/caffe/common_layers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ class ArgMaxLayer : public Layer<Dtype> {
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
vector<Blob<Dtype>*>* bottom) { NOT_IMPLEMENTED; }

virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_ARGMAX;
Expand All @@ -39,12 +44,6 @@ class ArgMaxLayer : public Layer<Dtype> {
virtual inline int ExactNumTopBlobs() const { return 1; }

protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
NOT_IMPLEMENTED;
}
bool out_max_val_;
size_t top_k_;
};
Expand All @@ -60,6 +59,10 @@ class ConcatLayer : public Layer<Dtype> {
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_CONCAT;
Expand All @@ -68,15 +71,6 @@ class ConcatLayer : public Layer<Dtype> {
virtual inline int ExactNumTopBlobs() const { return 1; }

protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

Blob<Dtype> col_bob_;
int count_;
int num_;
Expand All @@ -95,6 +89,10 @@ class FlattenLayer : public Layer<Dtype> {
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_FLATTEN;
Expand All @@ -103,15 +101,6 @@ class FlattenLayer : public Layer<Dtype> {
virtual inline int ExactNumTopBlobs() const { return 1; }

protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

int count_;
};

Expand All @@ -131,16 +120,12 @@ class MVNLayer : public Layer<Dtype> {
virtual inline int ExactNumBottomBlobs() const { return 1; }
virtual inline int ExactNumTopBlobs() const { return 1; }

protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
virtual Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
virtual void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

protected:
Blob<Dtype> mean_, variance_, temp_;

// sum_multiplier is just used to carry out sum using blas
Expand Down Expand Up @@ -188,6 +173,10 @@ class SplitLayer : public Layer<Dtype> {
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_SPLIT;
Expand All @@ -196,15 +185,6 @@ class SplitLayer : public Layer<Dtype> {
virtual inline int MinTopBlobs() const { return 1; }

protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

int count_;
};

Expand Down
Loading