-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Serialize LoDTensor, Save/Restore model" #4602
Changes from 14 commits
fa5b154
5c617f6
e63a2b3
c6be3f3
425a6b6
2f8eb95
f69e444
5a6e6b2
d111c7a
70786ee
1f31265
1e92448
6d24d23
e5974d1
d756fe1
54819cc
8fdca7d
7407afd
15fd027
a776dfc
d7e25aa
a05883f
f918bfc
feb23f4
c5e6307
69db65d
0961597
78b24a6
9e8ddc1
e1c1e2c
4bc80d9
4150e27
7fdc536
ad08120
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Design Doc: Model Format | ||
|
||
## Motivation | ||
|
||
The model is the output of training process. One complete model consists of two parts, namely, the **topology** and the **parameters**. To support business deployment, we need to make the model format must be self-completed and do not expose any training source code. | ||
|
||
As a result, In PaddlePaddle, the **topology** represents as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model, we must support large size parameter, and efficient serialization/deserialization. | ||
|
||
## Implementation | ||
|
||
The topology is saved as a plain text, in detail, a self-contain protobuf file. | ||
|
||
The parameters are saved as a binary file. As we all know, the protobuf message has the limits of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We do a (benchmark experiment)[https://github.com/PaddlePaddle/Paddle/pull/4610], its result shows protobuf is not fit in this scene. | ||
|
||
As a result, we design a particular format for tensor serialization. By default, arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of (LoDTensorDesc)[https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99]. We save the DescProto as the byte string header, it contains the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). Tensor stores value in a continuous memory buffer, for speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is, | ||
|
||
|HeaderLength|ContentLength|**LoDTensorDesc**|**TensorValue**| | ||
|
||
In detail, tensor's byte view as the table shows. Note that all the signed value written in little-endian. | ||
|
||
```text | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In fact, I think the protobuf maybe a better choice than the current design. To break through the limitation of 64M, We can divide big parameter into small ones. So the only concern we need to consider is the packed parameter size. |
||
[offset] [type] [description] | ||
0000 32 bit integer version number | ||
0004 32 bit integer HeaderLength, the length of LoDTensorDesc | ||
0008 64 bit integer ContentLength, the length of LodTensor Buffer | ||
0009 8 bit char TensorDesc | ||
00010 8 bit char TensorDesc | ||
... | ||
00100 8 bit char TensorValue | ||
00101 8 bit char TensorValue | ||
00102 8 bit char TensorValue .. | ||
... | ||
``` | ||
|
||
## Summary | ||
|
||
We introduce the model format, the `ProgramDesc` describe the **topology**, and a bunch of particular format binary tensors describes the **parameters**. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,14 @@ | |
limitations under the License. */ | ||
|
||
#include "paddle/framework/lod_tensor.h" | ||
#include "paddle/framework/saver.pb.h" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a blank line under #include "paddle/framework/lod_tensor.h" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
#include "paddle/memory/memcpy.h" | ||
#include "paddle/memory/memory.h" | ||
|
||
#include <stdint.h> | ||
#include <string.h> | ||
#include <algorithm> | ||
#include <iterator> | ||
|
||
#include <glog/logging.h> | ||
|
||
|
@@ -103,5 +111,139 @@ void LoDTensor::ShrinkInLevel(size_t level, size_t elem_begin, | |
lod_ = new_lod; | ||
} | ||
|
||
std::string LoDTensor::SerializeToString() const { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make There may be more such serialization functions, such as better not to insert methods that no relation with computation into
and this function is too long, break it and keep the code clean. if so much code is added and is no relation to the definition or operation of the concepts of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After a talk with @Superjom face to face, my opinion on this question as below.
Thanks for this comment! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
LoDTensorProto desc; | ||
|
||
// set data_type | ||
if (this->type() == typeid(int8_t)) desc.set_data_type(DataType::BOOL); | ||
if (this->type() == typeid(int16_t)) desc.set_data_type(DataType::INT16); | ||
if (this->type() == typeid(int32_t)) desc.set_data_type(DataType::INT32); | ||
if (this->type() == typeid(int64_t)) desc.set_data_type(DataType::INT64); | ||
// FIXME(dzh): there is no fp16 in standard c++ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. map<const typeinfo&, DataType> typeid2datatype = {
{typeid(int8_t), DataType::BOOL},
{typeid(int16_t), DataType::INT16},
....
};
desc.set_data_type(typeid2datatype[this->type()]); put this map outside somewhere, may be a member of a struct. This can be reused, and make code cleaner. |
||
|
||
if (this->type() == typeid(float)) // NOLINT | ||
desc.set_data_type(DataType::FP32); | ||
if (this->type() == typeid(double)) // NOLINT | ||
desc.set_data_type(DataType::FP64); | ||
|
||
// set dims | ||
std::vector<int64_t> dims = vectorize(this->dims()); | ||
for (auto& dim : dims) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe for (int i = 0; i < dims().size(); ++i) {
desc.add_dims(dims()[i]);
} is enough, no need to create a new vector. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
desc.add_dims(dim); | ||
} | ||
|
||
// set lod information | ||
desc.set_lod_level(this->NumLevels()); | ||
for (size_t i = 0; i < this->NumLevels(); ++i) { | ||
LoDInfo* lod = desc.add_levels(); | ||
for (size_t j = 0; j < lod_[i].size(); ++j) { | ||
lod->add_level(lod_[i][j]); | ||
} | ||
} | ||
|
||
// set place information | ||
platform::Place place = holder_->place(); | ||
|
||
std::string desc_bytes = desc.SerializeAsString(); | ||
|
||
// FIXME(dzh) : implement fix chunk size buffer. | ||
size_t DESC_SIZE = desc_bytes.size(); | ||
size_t DATA_SIZE = holder_->size() - offset_; | ||
|
||
const size_t BUFFER_SIZE = DESC_SIZE + DATA_SIZE + 2 * sizeof(size_t); | ||
char* buffer = | ||
static_cast<char*>(memory::Alloc(platform::CPUPlace(), BUFFER_SIZE)); | ||
|
||
// format: desc_size data_size, desc_bytes, data_bytes. | ||
platform::CPUPlace src_place; | ||
platform::CPUPlace dst_place; | ||
|
||
memory::Copy(dst_place, buffer, src_place, &DESC_SIZE, sizeof(size_t)); | ||
memory::Copy(dst_place, buffer + sizeof(size_t), src_place, &DATA_SIZE, | ||
sizeof(size_t)); | ||
memory::Copy(dst_place, buffer + sizeof(size_t) * 2, src_place, | ||
desc_bytes.c_str(), desc_bytes.size()); | ||
|
||
PADDLE_ENFORCE(this->numel() != 0, " Serialize a empty Tensor!"); | ||
|
||
int element_width = holder_->size() / this->numel(); | ||
if (platform::is_cpu_place(place)) { | ||
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(), | ||
boost::get<platform::CPUPlace>(place), | ||
static_cast<char*>(holder_->ptr()) + offset_ / element_width, | ||
DATA_SIZE); | ||
} | ||
#ifdef PADDLE_WITH_GPU | ||
if (platform::is_gpu_place(place)) { | ||
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(), | ||
boost::get<platform::GPUPlace>(place), | ||
static_cast<char*>(holder_->ptr()) + offset_ / element_width, | ||
DATA_SIZE); | ||
} | ||
#endif | ||
|
||
std::string ret(buffer, BUFFER_SIZE); | ||
memory::Free(platform::CPUPlace(), buffer); | ||
return ret; | ||
} | ||
|
||
void LoDTensor::DeserializeFromString(const std::string& s, | ||
const platform::Place& dst_place) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so is this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above. |
||
size_t DESC_SIZE, DATA_SIZE; | ||
DESC_SIZE = DATA_SIZE = 100; | ||
platform::CPUPlace src_place; | ||
memory::Copy(src_place, &DESC_SIZE, src_place, s.c_str(), sizeof(size_t)); | ||
memory::Copy(src_place, &DATA_SIZE, src_place, s.c_str() + sizeof(size_t), | ||
sizeof(size_t)); | ||
|
||
// parse LoDTensorDesc | ||
LoDTensorProto desc; | ||
desc.ParseFromArray(s.c_str() + sizeof(size_t) * 2, DESC_SIZE); | ||
|
||
std::vector<int64_t> dims; | ||
std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims)); | ||
this->Resize(make_ddim(dims)); | ||
|
||
// parse data type | ||
void* ptr; | ||
if (desc.data_type() == DataType::BOOL) | ||
ptr = this->mutable_data<bool>(dst_place); | ||
if (desc.data_type() == DataType::INT16) | ||
ptr = this->mutable_data<int16_t>(dst_place); | ||
if (desc.data_type() == DataType::INT32) | ||
ptr = this->mutable_data<int32_t>(dst_place); | ||
if (desc.data_type() == DataType::INT64) | ||
ptr = this->mutable_data<int64_t>(dst_place); | ||
// FIXME(dzh): there is no fp16 in standard c++ | ||
|
||
if (desc.data_type() == DataType::FP32) | ||
ptr = this->mutable_data<float>(dst_place); | ||
if (desc.data_type() == DataType::FP64) | ||
ptr = this->mutable_data<double>(dst_place); | ||
|
||
LoD lod; | ||
std::vector<size_t> levels; | ||
for (int i = 0; i < desc.levels().size(); ++i) { | ||
auto current_level = desc.levels()[i].level(); | ||
std::copy(current_level.begin(), current_level.end(), | ||
std::back_inserter(levels)); | ||
lod.emplace_back(levels); | ||
levels.clear(); | ||
} | ||
|
||
this->set_lod(lod); | ||
|
||
if (platform::is_cpu_place(dst_place)) { | ||
memory::Copy(boost::get<platform::CPUPlace>(dst_place), ptr, src_place, | ||
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE); | ||
} | ||
#ifdef PADDLE_WITH_GPU | ||
if (platform::is_gpu_place(dst_place)) { | ||
memory::Copy(boost::get<platform::GPUPlace>(dst_place), ptr, src_place, | ||
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE); | ||
} | ||
#endif | ||
} | ||
|
||
} // namespace framework | ||
} // namespace paddle |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,7 +38,10 @@ class LoDTensorTester : public ::testing::Test { | |
|
||
lod_tensor_.Resize({20 /*batch size*/, 128 /*dim*/}); | ||
// malloc memory | ||
lod_tensor_.mutable_data<float>(place); | ||
float* dst_ptr = lod_tensor_.mutable_data<float>(place); | ||
for (int i = 0; i < 20 * 128; ++i) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a const value for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
dst_ptr[i] = i; | ||
} | ||
|
||
lod_tensor_.set_lod(lod); | ||
} | ||
|
@@ -102,5 +105,21 @@ TEST_F(LoDTensorTester, ShrinkInLevel) { | |
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); | ||
} | ||
|
||
TEST_F(LoDTensorTester, SerializeDeserialize) { | ||
LoDTensor new_lod_tensor = lod_tensor_; | ||
float* src_ptr = lod_tensor_.data<float>(); | ||
std::string s = lod_tensor_.SerializeToString(); | ||
LoDTensor dst; | ||
dst.DeserializeFromString(s, platform::CPUPlace()); | ||
float* dst_ptr = dst.data<float>(); | ||
for (int i = 0; i < 20 * 128; ++i) { | ||
EXPECT_EQ(dst_ptr[i], src_ptr[i]); | ||
} | ||
|
||
ASSERT_EQ(dst.NumElements(0), 2UL); | ||
ASSERT_EQ(dst.NumElements(1), 4UL); | ||
ASSERT_EQ(dst.NumElements(2), 8UL); | ||
} | ||
|
||
} // namespace framework | ||
} // namespace paddle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. */ | ||
|
||
syntax = "proto2"; | ||
package paddle.framework; | ||
|
||
import "framework.proto"; | ||
|
||
/** | ||
* This file contains necessary information for model, checkpoint. | ||
* etc. | ||
*/ | ||
|
||
message LoDInfo { repeated int64 level = 1; } | ||
|
||
/** | ||
* Save the LoDTensorDesc information through LoDTensorProto, its data memory | ||
* is copyed to c buffer immediately. See model_format.md for details. | ||
*/ | ||
|
||
message LoDTensorProto { | ||
optional DataType data_type = 1; | ||
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] | ||
repeated LoDInfo levels = 3; | ||
optional int32 lod_level = 4 [ default = 0 ]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
business deployment
=>industrial deployment
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.