Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving all trained params in a single file #7722

Closed
sidgoyal78 opened this issue Jan 22, 2018 · 6 comments
Closed

Saving all trained params in a single file #7722

sidgoyal78 opened this issue Jan 22, 2018 · 6 comments
Assignees
Labels
预测 原名Inference,包含Capi预测问题等

Comments

@sidgoyal78
Copy link
Contributor

Merging all params in a single file

For inference, we will to have 2 files, one for the programDesc and one that has all the params together. We look at 1 approach to do this.

Understanding save/load ops (C++ side)

  • From the model_format design doc, we see some details in the table but it is not super clear. So we will look at the implementation.

To understand the current serialization: we look at save_op

  • In save_op the main work is performed by SerializeToStream( <ofstream>, <framework::LoDTensor>, .. ) Code. This function saves a version number, size of LoD and actual LoD data.

  • Then it calls, SerializeToStream(<ofstream>, <Tensor> ..) Code. This function saves a version number, tensor description as a serialized protobuf, and the actual data.

The corresponding load_op basically does the deserialization accordingly (respecting the ordering in the save_op).

Understanding how a model is saved (python api)

Now, we look at how the save/load works for saving actual model params, we look at the implementation of save_vars in fluid. Code. We see that a new program is created with save op is appended for each vars which is persistable. Then the executor runs this program.

Approach

We basically make two assumptions:

  • For both load/save, the order of iterating over the variables is the same. (This should hopefully be true)
  • We don't worry about the overwrite option which is in save_op.

While saving:

  • We basically store a uint64_t number in addition to the actual serialized bytes as in the original save. This number will tell us about the size of the serialized LoDTensor in bytes.

  • When the save is called for the first time, we will create a file, create a string that will have serialized LoDTensor data. Now we store the size of this string first in a fixed width (uint64_t) number, and then store the string.

  • When the save is called later, we basically go to the end of the file, and store 2 things: the size of the string and the string itself.

While loading:

  • We pass an additional attribute, in order to load the correct chunk of parameter. So we pass a counter value (which counts from 0 the relative order of the different params).

  • With this counter and the extra size information that we stored, we can hop to the appropriate part of the file, and read the chunk, and deserialize it.

For implementation, i think it will be better to have another op for this (rather than replacing the original save_op/load_op, so that is easier to debug, and i don't know the details of how the load_op and save_op are used in distributed version as of now).

@sidgoyal78 sidgoyal78 self-assigned this Jan 22, 2018
@sidgoyal78 sidgoyal78 added the 预测 原名Inference,包含Capi预测问题等 label Jan 22, 2018
@sidgoyal78
Copy link
Contributor Author

I think we can even handle the "overwrite" file case. We can also have a counter for the save op, and if the counter is 0, and file exists and overwrite = false, we abort.
But if counter is 0, file exists, overwrite = true, we create a new file (open file in write mode).

@Xreki
Copy link
Contributor

Xreki commented Jan 22, 2018

Let me take an example to explain the saving format as I understand.

If there are two parameters, fc0.w0 and fc0.b0, they will be saved in a file as:

uint64_t: size of fc0.w0
string: serialized LoDTensor data of fc0.w0
uint64_t: size of fc0.b0
string: serialized of LoDTensor data of fc0.b0

Am I right? I think we may need another string to record the parameter's name.

string: fc0.w0
uint64_t: size of fc0.w0
string: serialized LoDTensor data of fc0.w0
string: fc0.b0
uint64_t: size of fc0.b0
string: serialized of LoDTensor data of fc0.b0

For implementation, i think it will be better to have another op for this (rather than replacing the original save_op/load_op, so that is easier to debug, and i don't know the details of how the load_op and save_op are used in distributed version as of now).

In the first implementation, we may fill the parameter's Tensor in the Load() function directly.

@sidgoyal78
Copy link
Contributor Author

sidgoyal78 commented Jan 22, 2018

Yes, you are right. I think the 'name' of the param is not currently stored. From the implementation (ref), we see only TensorDesc is stored. So It should be merged with the serialized LoDTensor data and then the size should be generated: (see comment below)

So I am thinking of it more like this:

uint64_t: size of fc0.w0 + size of string fc0.w0 + 1
string: lengthof(fc0.w0) + fc0.w0 + serialized LoDTensor data of fc0.w0
uint64_t: size of fc0.b0 + size of string fc0.b0 + 1
string: lengthof(fc0.b0) + fc0.b0 + serialized of LoDTensor data of fc0.b0

(Otherwise, we won't know the size of the string fc0.w0/fc0.b0 beforehand, so we need to merge it with the serialization of the LoDtensor, and then generate the size)

@sidgoyal78
Copy link
Contributor Author

sidgoyal78 commented Jan 22, 2018

@Xreki : I think the name isn't required, it is obtained from the programDesc, and passed accordingly: code. And since we are storing the programDesc as a protobuf then i don't see the need of storing it again (provided the ordering when iterating through load_vars and save_vars remains same, as in assumption 1 holds true).

For a concrete example, if we have fc1, b1, fc2, b2. So the order will be the same when we call load_vars or save_vars.

We iterate over this list, and pass an additional counter for save and load:

While saving:

  1. we call new_save with counter = 0, and a filepath
uint64_t: 340 (for example)
340 bytes of fc1 data and it's desc
  1. we call new_save with counter = 1, and the same filepath
    We basically go the end of the file and append this:
uint64_t: 340 (for example)
340 bytes of fc1 data and it's desc
uint64_t: 70 (for example)
70 bytes of b1 data and it's desc

.
.
.
we finally get:

uint64_t: 340 (for example)
340 bytes of fc1 data and it's desc
uint64_t: 70 (for example)
70 bytes of b1 data and it's desc
uint64_t: 640 (for example)
840 bytes of fc2 data and it's desc
uint64_t: 80 (for example)
80 bytes of b2 data and it's desc

Now while loading we proceed as:

  1. we call new_load with counter = 0, and with op: fc1, and the above filepath
    Since counter = 0; we don't hop at all, we read the first unit64 and read the string and convert it into fc1.
uint64_t: 340 (for example)                  <-- we proceed till here
340 bytes of fc1 data and it's desc
uint64_t: 70 (for example)
70 bytes of b1 data and it's desc
uint64_t: 640 (for example)
840 bytes of fc2 data and it's desc
uint64_t: 80 (for example)
80 bytes of b2 data and it's desc

Now we read the uint64 to get the bytes for fc1 and read those many bytes, then deserialize to obtain fc1.

  1. we call new_load with counter = 1, and with op:b1, and the above filepath
    Now, since the counter = 1, we hop once, and go from first uint64 to the next one:
uint64_t: 340 (for example)
340 bytes of fc1 data and it's desc
uint64_t: 70 (for example) .                  <-- we proceed till here
70 bytes of b1 data and it's desc
uint64_t: 640 (for example)
840 bytes of fc2 data and it's desc
uint64_t: 80 (for example)
80 bytes of b2 data and it's desc

Now we read the uint64 to get the bytes for b1 and read those many bytes, then deserialize to obtain b1.

.
.
.

  1. we call new_load with counter = 3, and with op:b2, and the above filepath
    Now, since the counter = 3, we hop three times, and go from first uint64 to the fourth one:
uint64_t: 340 (for example)
340 bytes of fc1 data and it's desc
uint64_t: 70 (for example) .                
70 bytes of b1 data and it's desc
uint64_t: 640 (for example)
840 bytes of fc2 data and it's desc
uint64_t: 80 (for example)                         <-- we proceed till here
80 bytes of b2 data and it's desc

Now we read the uint64 to get the bytes for b2 and read those many bytes, then deserialize to obtain b2.

@Xreki
Copy link
Contributor

Xreki commented Jan 23, 2018

OK. The design is based on the order. We need to make sure the loading order is totally the same as the saving order.

@Xreki
Copy link
Contributor

Xreki commented Feb 9, 2018

Fixed by #7995 and #7909

@Xreki Xreki closed this as completed Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
None yet
Development

No branches or pull requests

2 participants