Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Serialize LoDTensor, Save/Restore model" #4602

Merged
merged 34 commits into from
Oct 24, 2017
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fa5b154
"add model format design doc"
dzhwinter Oct 5, 2017
5c617f6
"add restore function"
dzhwinter Oct 5, 2017
e63a2b3
"add parse protobuf"
dzhwinter Oct 6, 2017
c6be3f3
Merge branch 'develop' into feature/checkpoint
dzhwinter Oct 6, 2017
425a6b6
"move necessary information to saver.proto"
dzhwinter Oct 6, 2017
2f8eb95
"format code"
dzhwinter Oct 6, 2017
f69e444
"add gpu option"
dzhwinter Oct 7, 2017
5a6e6b2
"add lod info"
dzhwinter Oct 9, 2017
d111c7a
"add saveop python test wrapper"
dzhwinter Oct 9, 2017
70786ee
"checkpoint reuse save operator"
dzhwinter Oct 9, 2017
1f31265
"rewrite model format design doc"
dzhwinter Oct 9, 2017
1e92448
"async support needed"
dzhwinter Oct 9, 2017
6d24d23
"fix run once"
dzhwinter Oct 10, 2017
e5974d1
"fix doc based on comments"
dzhwinter Oct 10, 2017
d756fe1
"refine based on comments"
dzhwinter Oct 10, 2017
54819cc
"fix based comments"
dzhwinter Oct 10, 2017
8fdca7d
merge into develop
dzhwinter Oct 20, 2017
7407afd
"remove persistable flag from framework.proto"
dzhwinter Oct 20, 2017
15fd027
"add IndicateDataType to restore op"
dzhwinter Oct 21, 2017
a776dfc
Merge remote-tracking branch 'origin/develop' into feature/checkpoint
dzhwinter Oct 21, 2017
d7e25aa
"add save test"
dzhwinter Oct 22, 2017
a05883f
"modify save restore code"
dzhwinter Oct 22, 2017
f918bfc
"modified the restore logic"
dzhwinter Oct 23, 2017
feb23f4
rm checkpoint_op.cc
JiayiFeng Oct 23, 2017
c5e6307
rm test_checkpoint_op.py
JiayiFeng Oct 23, 2017
69db65d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
JiayiFeng Oct 23, 2017
0961597
"get inputs outputs name from execution context"
dzhwinter Oct 23, 2017
78b24a6
Merge branch 'fix/add_get_name' of https://github.com/dzhwinter/Paddl…
JiayiFeng Oct 23, 2017
9e8ddc1
Saving each variable to a independent file
JiayiFeng Oct 23, 2017
e1c1e2c
Fix bugs
JiayiFeng Oct 23, 2017
4bc80d9
Rewrite save_restore_op_test with new Python framework
JiayiFeng Oct 24, 2017
4150e27
Move `SaveOp` and `RestoreOp` from OpWithKernel to OpBase
JiayiFeng Oct 24, 2017
7fdc536
Refine unit test of SaveOp and RestoreOp
JiayiFeng Oct 24, 2017
ad08120
fix compile errorwq
JiayiFeng Oct 24, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions doc/design/model_format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Design Doc: Model Format

## Motivation

The model is the output of training process. One complete model consists of two parts, namely, the **topology** and the **parameters**. To support business deployment, we need to make the model format must be self-completed and do not expose any training source code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe business deployment => industrial deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


As a result, In PaddlePaddle, the **topology** represents as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model, we must support large size parameter, and efficient serialization/deserialization.

## Implementation

The topology is saved as a plain text, in detail, a self-contain protobuf file.

The parameters are saved as a binary file. As we all know, the protobuf message has the limits of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We do a (benchmark experiment)[https://github.com/PaddlePaddle/Paddle/pull/4610], its result shows protobuf is not fit in this scene.

As a result, we design a particular format for tensor serialization. By default, arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of (LoDTensorDesc)[https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99]. We save the DescProto as the byte string header, it contains the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). Tensor stores value in a continuous memory buffer, for speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,

|HeaderLength|ContentLength|**LoDTensorDesc**|**TensorValue**|

In detail, tensor's byte view as the table shows. Note that all the signed value written in little-endian.

```text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think the protobuf maybe a better choice than the current design. To break through the limitation of 64M, We can divide big parameter into small ones. So the only concern we need to consider is the packed parameter size.
According to @helinwang 's comment, every repeated message has a small tag, which is not fit to the chunk data.
we will do some benchmark experiments, and choose a better one.

[offset] [type] [description]
0000 32 bit integer version number
0004 32 bit integer HeaderLength, the length of LoDTensorDesc
0008 64 bit integer ContentLength, the length of LodTensor Buffer
0009 8 bit char TensorDesc
00010 8 bit char TensorDesc
...
00100 8 bit char TensorValue
00101 8 bit char TensorValue
00102 8 bit char TensorValue ..
...
```

## Summary

We introduce the model format, the `ProgramDesc` describe the **topology**, and a bunch of particular format binary tensors describes the **parameters**.
8 changes: 5 additions & 3 deletions paddle/framework/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# ddim lib
proto_library(framework_proto SRCS framework.proto)
proto_library(saver_proto SRCS framework.proto saver.proto)

cc_library(ddim SRCS ddim.cc DEPS eigen3)
cc_test(ddim_test SRCS ddim_test.cc DEPS ddim)
nv_test(dim_test SRCS dim_test.cu DEPS ddim)
Expand All @@ -7,16 +10,15 @@ cc_library(tensor SRCS tensor.cc DEPS ddim place paddle_memory device_context)
cc_test(tensor_test SRCS tensor_test.cc DEPS tensor)
cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)

cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor)
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor)
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor saver_proto framework_proto)
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor paddle_memory)
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)

cc_test(variable_test SRCS variable_test.cc)

cc_library(scope SRCS scope.cc)
cc_test(scope_test SRCS scope_test.cc DEPS scope)

proto_library(framework_proto SRCS framework.proto)

cc_library(attribute SRCS attribute.cc DEPS framework_proto)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute)
Expand Down
1 change: 1 addition & 0 deletions paddle/framework/framework.proto
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ message LoDTensorDesc {
message VarDesc {
required string name = 1;
optional LoDTensorDesc lod_tensor = 2;
optional bool persistable = 3 [ default = false ];
}

message BlockDesc {
Expand Down
142 changes: 142 additions & 0 deletions paddle/framework/lod_tensor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@
limitations under the License. */

#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/saver.pb.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line under

#include "paddle/framework/lod_tensor.h"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#include "paddle/memory/memcpy.h"
#include "paddle/memory/memory.h"

#include <stdint.h>
#include <string.h>
#include <algorithm>
#include <iterator>

#include <glog/logging.h>

Expand Down Expand Up @@ -103,5 +111,139 @@ void LoDTensor::ShrinkInLevel(size_t level, size_t elem_begin,
lod_ = new_lod;
}

std::string LoDTensor::SerializeToString() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make SerializeToString an external function such as SerializeToString(LoDTensor).

There may be more such serialization functions, such as SerializeToString(OperatorBase), do not change the definition of the original class.

better not to insert methods that no relation with computation into class LoDTensor.

LoDTensor serves as a concept for computation, keep it clean.

and this function is too long, break it and keep the code clean.

if so much code is added and is no relation to the definition or operation of the concepts of LoD or Tensor, place it inside namespace detail or in another source file is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a talk with @Superjom face to face, my opinion on this question as below.

  1. Currently, we only need to serialize the in-memory content into a byte-stream. Namely, SerializeToString(LoDTensor), SerializeToString(Tensor). Operatorbase and other concepts all have their Desc in protobuf, we do not need any other class serializes implementation.

  2. Implement DeserilizeFromString will return a Tensor filled with value, if we don't bind the serialize interface to the Tensor instance, we need another copy of the Tensor.

  3. The offset_ and type in Tensor is hidden. Need to figure them out.

Thanks for this comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeserilizeFromString(LoDTensor*) is no need to copy a Tensor, fill the data in-place seems possible.

LoDTensorProto desc;

// set data_type
if (this->type() == typeid(int8_t)) desc.set_data_type(DataType::BOOL);
if (this->type() == typeid(int16_t)) desc.set_data_type(DataType::INT16);
if (this->type() == typeid(int32_t)) desc.set_data_type(DataType::INT32);
if (this->type() == typeid(int64_t)) desc.set_data_type(DataType::INT64);
// FIXME(dzh): there is no fp16 in standard c++
Copy link
Contributor

@Superjomn Superjomn Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map<const typeinfo&, DataType> typeid2datatype = {
  {typeid(int8_t), DataType::BOOL},
  {typeid(int16_t), DataType::INT16},
  ....
};

desc.set_data_type(typeid2datatype[this->type()]);

put this map outside somewhere, may be a member of a struct.

This can be reused, and make code cleaner.


if (this->type() == typeid(float)) // NOLINT
desc.set_data_type(DataType::FP32);
if (this->type() == typeid(double)) // NOLINT
desc.set_data_type(DataType::FP64);

// set dims
std::vector<int64_t> dims = vectorize(this->dims());
for (auto& dim : dims) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

for (int i = 0; i < dims().size(); ++i) {
    desc.add_dims(dims()[i]);
}

is enough, no need to create a new vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

desc.add_dims(dim);
}

// set lod information
desc.set_lod_level(this->NumLevels());
for (size_t i = 0; i < this->NumLevels(); ++i) {
LoDInfo* lod = desc.add_levels();
for (size_t j = 0; j < lod_[i].size(); ++j) {
lod->add_level(lod_[i][j]);
}
}

// set place information
platform::Place place = holder_->place();

std::string desc_bytes = desc.SerializeAsString();

// FIXME(dzh) : implement fix chunk size buffer.
size_t DESC_SIZE = desc_bytes.size();
size_t DATA_SIZE = holder_->size() - offset_;

const size_t BUFFER_SIZE = DESC_SIZE + DATA_SIZE + 2 * sizeof(size_t);
char* buffer =
static_cast<char*>(memory::Alloc(platform::CPUPlace(), BUFFER_SIZE));

// format: desc_size data_size, desc_bytes, data_bytes.
platform::CPUPlace src_place;
platform::CPUPlace dst_place;

memory::Copy(dst_place, buffer, src_place, &DESC_SIZE, sizeof(size_t));
memory::Copy(dst_place, buffer + sizeof(size_t), src_place, &DATA_SIZE,
sizeof(size_t));
memory::Copy(dst_place, buffer + sizeof(size_t) * 2, src_place,
desc_bytes.c_str(), desc_bytes.size());

PADDLE_ENFORCE(this->numel() != 0, " Serialize a empty Tensor!");

int element_width = holder_->size() / this->numel();
if (platform::is_cpu_place(place)) {
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(),
boost::get<platform::CPUPlace>(place),
static_cast<char*>(holder_->ptr()) + offset_ / element_width,
DATA_SIZE);
}
#ifdef PADDLE_WITH_GPU
if (platform::is_gpu_place(place)) {
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(),
boost::get<platform::GPUPlace>(place),
static_cast<char*>(holder_->ptr()) + offset_ / element_width,
DATA_SIZE);
}
#endif

std::string ret(buffer, BUFFER_SIZE);
memory::Free(platform::CPUPlace(), buffer);
return ret;
}

void LoDTensor::DeserializeFromString(const std::string& s,
const platform::Place& dst_place) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

size_t DESC_SIZE, DATA_SIZE;
DESC_SIZE = DATA_SIZE = 100;
platform::CPUPlace src_place;
memory::Copy(src_place, &DESC_SIZE, src_place, s.c_str(), sizeof(size_t));
memory::Copy(src_place, &DATA_SIZE, src_place, s.c_str() + sizeof(size_t),
sizeof(size_t));

// parse LoDTensorDesc
LoDTensorProto desc;
desc.ParseFromArray(s.c_str() + sizeof(size_t) * 2, DESC_SIZE);

std::vector<int64_t> dims;
std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims));
this->Resize(make_ddim(dims));

// parse data type
void* ptr;
if (desc.data_type() == DataType::BOOL)
ptr = this->mutable_data<bool>(dst_place);
if (desc.data_type() == DataType::INT16)
ptr = this->mutable_data<int16_t>(dst_place);
if (desc.data_type() == DataType::INT32)
ptr = this->mutable_data<int32_t>(dst_place);
if (desc.data_type() == DataType::INT64)
ptr = this->mutable_data<int64_t>(dst_place);
// FIXME(dzh): there is no fp16 in standard c++

if (desc.data_type() == DataType::FP32)
ptr = this->mutable_data<float>(dst_place);
if (desc.data_type() == DataType::FP64)
ptr = this->mutable_data<double>(dst_place);

LoD lod;
std::vector<size_t> levels;
for (int i = 0; i < desc.levels().size(); ++i) {
auto current_level = desc.levels()[i].level();
std::copy(current_level.begin(), current_level.end(),
std::back_inserter(levels));
lod.emplace_back(levels);
levels.clear();
}

this->set_lod(lod);

if (platform::is_cpu_place(dst_place)) {
memory::Copy(boost::get<platform::CPUPlace>(dst_place), ptr, src_place,
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE);
}
#ifdef PADDLE_WITH_GPU
if (platform::is_gpu_place(dst_place)) {
memory::Copy(boost::get<platform::GPUPlace>(dst_place), ptr, src_place,
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE);
}
#endif
}

} // namespace framework
} // namespace paddle
22 changes: 22 additions & 0 deletions paddle/framework/lod_tensor.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "paddle/framework/ddim.h"
#include "paddle/framework/tensor.h"
#include "paddle/platform/enforce.h"
#include "paddle/platform/place.h"

namespace paddle {
namespace framework {
Expand Down Expand Up @@ -119,6 +120,27 @@ class LoDTensor : public Tensor {
*/
void ShrinkInLevel(size_t level, size_t elem_begin, size_t elem_end);

/**
* @brief Serialize tensor to char bytes.
* Please check model_format.md for the format detail.
* NOTE: GPUTensor will copy data to cpu implicitly.
* @return return string
*/

// FIXME(dzh) : Currently, this interface should only be used in
// save/restore model and checkpoint. ParameterServer do not use shape
// information to do the optimization, as a result, when we serialize
// parameter/gradient to string, we should serialize the tensor
// to string in the ps trainer instead of LoDTensor.
std::string SerializeToString() const;

/**
* @brief Deserialize char bytes to tensor.
* @return return string
*/
void DeserializeFromString(const std::string& s,
const platform::Place& dst_place);

private:
LoD lod_;
};
Expand Down
21 changes: 20 additions & 1 deletion paddle/framework/lod_tensor_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ class LoDTensorTester : public ::testing::Test {

lod_tensor_.Resize({20 /*batch size*/, 128 /*dim*/});
// malloc memory
lod_tensor_.mutable_data<float>(place);
float* dst_ptr = lod_tensor_.mutable_data<float>(place);
for (int i = 0; i < 20 * 128; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a const value for 20 * 128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dst_ptr[i] = i;
}

lod_tensor_.set_lod(lod);
}
Expand Down Expand Up @@ -102,5 +105,21 @@ TEST_F(LoDTensorTester, ShrinkInLevel) {
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
}

TEST_F(LoDTensorTester, SerializeDeserialize) {
LoDTensor new_lod_tensor = lod_tensor_;
float* src_ptr = lod_tensor_.data<float>();
std::string s = lod_tensor_.SerializeToString();
LoDTensor dst;
dst.DeserializeFromString(s, platform::CPUPlace());
float* dst_ptr = dst.data<float>();
for (int i = 0; i < 20 * 128; ++i) {
EXPECT_EQ(dst_ptr[i], src_ptr[i]);
}

ASSERT_EQ(dst.NumElements(0), 2UL);
ASSERT_EQ(dst.NumElements(1), 4UL);
ASSERT_EQ(dst.NumElements(2), 8UL);
}

} // namespace framework
} // namespace paddle
37 changes: 37 additions & 0 deletions paddle/framework/saver.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

syntax = "proto2";
package paddle.framework;

import "framework.proto";

/**
* This file contains necessary information for model, checkpoint.
* etc.
*/

message LoDInfo { repeated int64 level = 1; }

/**
* Save the LoDTensorDesc information through LoDTensorProto, its data memory
* is copyed to c buffer immediately. See model_format.md for details.
*/

message LoDTensorProto {
optional DataType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
repeated LoDInfo levels = 3;
optional int32 lod_level = 4 [ default = 0 ];
}
17 changes: 17 additions & 0 deletions paddle/framework/scope.cc
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,22 @@ void Scope::DropKids() {
kids_.clear();
}

std::vector<std::string> Scope::GetAllNames(bool recursive) const {
std::vector<std::string> known_vars(vars_.size());

if (recursive) {
for (auto& kid : kids_) {
auto kid_vars = kid->GetAllNames();
for (auto& p : kid_vars) {
known_vars.emplace_back(p);
}
}
}
for (auto& p : vars_) {
known_vars.emplace_back(p.first);
}
return known_vars;
}

} // namespace framework
} // namespace paddle
4 changes: 4 additions & 0 deletions paddle/framework/scope.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License. */
#include <list>
#include <string>
#include <unordered_map>
#include <vector>

#include "paddle/framework/variable.h"
#include "paddle/platform/macros.h"
Expand Down Expand Up @@ -62,6 +63,9 @@ class Scope {
/// Drop all kids scopes belonged to this scope.
void DropKids();

// enumerate all the variables current contains.
std::vector<std::string> GetAllNames(bool recursive = false) const;

private:
// Call Scope::NewScope for a sub-scope.
explicit Scope(Scope const* parent) : parent_(parent) {}
Expand Down
Loading