-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用fleet.save_inference_model,ERROR:A protocol message was rejected because it was too big (more than 67108864 bytes). #3225
Comments
正在尝试本地复现,在持续跟进 |
目前在复现中发现的一个突出问题:构建网络及program的耗时异常,关于wide&deep的实现可以参考官方示例:https://github.com/PaddlePaddle/models/tree/8bca0e4311b444a61024c2a5dd755a22b47487da/legacy/ctr |
不是在实现wide&deep 啊;在qq里说了,是将将fm中的二阶交叉值取出来,和dnn输入一起接softmax; |
您好,我在本地尝试复现您的错误,使用您的组网和Code,但保存模型过程中没有出现相同的问题,最终成功保存了模型。请您提供一下您的运行环境和paddle版本,我继续跟进。 |
试了save_psersistables 没有用;paddle-fleet-release:v1.5;在docker容器上运行,这个应该没影响 |
你好,请问使用save_persistables也是相同的错误吗?另外,您使用的分布式配置是怎么样的?几个trainer几个pserver?使用dataset进行异步训练需要注意以下几个关键的配置:DistributeTranspilerConfig().sync_mode = False,同时DistributeTranspilerConfig().runtime_split_send_recv=True |
save_persistables一样的错误;其他模型可以正常跑,因为上面的训练auc高,所以想试试;pserver 一般16;上面的配置有的 |
您好,在我们的环境下无法复现您的问题,之前有其他同学提到过相似的问题,您看下是否有帮助呢? |
from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
fleet.save_inference_model(executor=exe, dirname=model_dir, feeded_var_names=feed_var_names, target_vars=[auc_var, batch_auc_var])保存模型出错,
/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py:1084: UserWarning: save_inference_model specified the param
program_only
to True, It will not save params of Program.2019-08-28 21:33:25 2019-08-28 13:33:23,876 [INFO] [10.38.26.135] --- "save_inference_model specified the param
program_only
to True, It will not save params of Program."2019-08-28 21:33:25 2019-08-28 13:33:24,698 [INFO] [10.38.26.135] --- [libprotobuf ERROR /paddle/build/third_party/protobuf/src/extern_protobuf/src/google/protobuf/io/coded_stream.cc:208] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
2019-08-28 21:33:25 2019-08-28 13:33:25,202 [INFO] [10.38.26.135] --- Traceback (most recent call last):
2019-08-28 21:33:25 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- File "/paddle/task-20190828201029-87895/dnn_dense_interaction.py", line 374, in
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- train()
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- File "/paddle/task-20190828201029-87895/dnn_dense_interaction.py", line 363, in train
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- fleet.save_inference_model(executor=exe, dirname=model_dir, feeded_var_names=feed_var_names, target_vars=[auc_var, batch_auc_var])
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/incubate/fleet/parameter_server/distribute_transpiler/init.py", line 157, in save_inference_model
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- program = Program.parse_from_string(program_desc_str)
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 3315, in parse_from_string
2019-08-28 21:33:26 2019-08-28 13:33:25,203 [INFO] [10.38.26.135] --- p.desc = core.ProgramDesc(binary_str)
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- paddle.fluid.core_avx.EnforceNotMet: Fail to parse program_desc from binary string. at [/paddle/paddle/fluid/framework/program_desc.cc:95]
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- PaddlePaddle Call Stacks:
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 0 0x7fa349ef999ap void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 506
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 1 0x7fa349efa6a5p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 165
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 2 0x7fa34a0c97bep paddle::framework::ProgramDesc::ProgramDesc(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 782
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 3 0x7fa349fbbb66p
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 4 0x7fa349f26a14p
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 5 0x4eef5ep
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 15 0x4b9b66p PyEval_EvalCodeEx + 774
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 16 0x4eb69fp
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 17 0x4e58f2p PyRun_FileExFlags + 130
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 13 0x4b9b66p PyEval_EvalCodeEx + 774
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 14 0x4c1f56p PyEval_EvalFrameEx + 24694
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 6 0x4eeb66p
2019-08-28 21:33:26 2019-08-28 13:33:25,204 [INFO] [10.38.26.135] --- 7 0x4aaafbp
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 8 0x4c166dp PyEval_EvalFrameEx + 22413
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 9 0x4b9b66p PyEval_EvalCodeEx + 774
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 10 0x4c1f56p PyEval_EvalFrameEx + 24694
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 11 0x4b9b66p PyEval_EvalCodeEx + 774
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 12 0x4c17c6p PyEval_EvalFrameEx + 22758
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 18 0x4e41a6p PyRun_SimpleFileExFlags + 390
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 19 0x4938cep Py_Main + 1358
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 20 0x7fa3bcf7a830p __libc_start_main + 240
2019-08-28 21:33:26 2019-08-28 13:33:25,205 [INFO] [10.38.26.135] --- 21 0x493299p _start + 41
The text was updated successfully, but these errors were encountered: