[BUG] Model compression fails for some descriptor neuron structures #1370

ssrokyz · 2021-12-22T02:30:59Z

Summary

When I compress the model with some neuron structures of the descriptor, it fails.
When I use the neuron of [25, 50, 100], it succeeds.
But when I use [50, 50, 50] or [100, 50, 25], it fails.

Deepmd-kit version, installation way, input file, running commands, error log, etc.
DP-kit version: v2.0.2
installation: source install
running commands: dp compress -i frozen_model.pb -t input.json
frozen_model.pb files will be attached.
frozen_model.zip

input.json:
`

{
"_comment": " model parameters",
"model": {
"type_map": ["Ge", "Sb", "Te"],
"descriptor" :{
"type": "se_e2_a",
"sel": [30, 30, 30],
"rcut_smth": 0.10,
"rcut": 6.00,
"neuron": [100, 50, 25],
"resnet_dt": false,
"axis_neuron": 16,
"activation_function": "tanh",
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [200, 200, 200],
"activation_function": "tanh",
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
"type":		"exp",
"start_lr":	1e-3,	
"stop_lr":	1e-3,
"_comment":	"that's all"
},

"loss" :{
"type":		"ener",
"start_pref_e": 1e-2,
"limit_pref_e": 1e-2,
"start_pref_f": 1e-0,
"limit_pref_f": 1e-0,
"start_pref_v": 1e-2,
"limit_pref_v": 1e-2,
"_comment":	" that's all"
},

"training" : {
    "training_data": {
        "systems": [
            "../raw-sffld_gst-hex-gather.traj.d/train",
            "../raw-sffld_gst-cubic-gather.traj.d/train",
            "../raw-sffld_gs-cubic-liquid-gather.traj.d",
            "../raw-sffld_sb-hex-liquid-8th.traj.d",
            "../raw-sffld_ge-cubic-liquid-gather.traj.d",
            "../raw-sffld_te-cubic-liquid-gather.traj.d"
        ],
        "batch_size":	2,
        "_comment":		"that's all"
    },
    "validation_data":{
        "systems": [
            "../raw-sffld_gst-hex-gather.traj.d/test",
            "../raw-sffld_gst-cubic-gather.traj.d/test"
        ],
        "batch_size":	50,
        "numb_btch":	1,
        "_comment":		"that's all"
    },
    "numb_steps":	2000000,
    "seed":		10,
    "disp_file":	"lcurve.out",
    "disp_freq":	1000,
    "save_freq":	1000,
    "save_ckpt": "bu/model.ckpt",
    "_comment":	"that's all"
    },    

"_comment":		"that's all"

}
`

Error log:
`
Traceback (most recent call last):
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1853, in _create_c_op
c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 50 and 200 for '{{node add_4/add}} = AddV2[T=DT_DOUBLE](Tanh_1, concat)' with input shapes: [1501,50], [1501,200].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1175, in binary_op_wrapper
out = r_op(x)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1195, in r_binary_op_wrapper
return func(x, y, name=name)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1486, in _add_dispatch
return gen_math_ops.add_v2(x, y, name=name)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 481, in add_v2
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
ret = Operation(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2015, in init
self._c_op = _create_c_op(self._graph, node_def, inputs,
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1856, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 50 and 200 for '{{node add_4/add}} = AddV2[T=DT_DOUBLE](Tanh_1, concat)' with input shapes: [1501,50], [1501,200].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/grad/ssrokyz/.conda/envs/tf24/bin/dp", line 33, in
sys.exit(load_entry_point('deepmd-kit==2.0.2', 'console_scripts', 'dp')())
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/entrypoints/main.py", line 447, in main
compress(**dict_args)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/entrypoints/compress.py", line 119, in compress
train(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/entrypoints/train.py", line 103, in train
_do_work(jdata, run_opt, is_compress)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/entrypoints/train.py", line 159, in _do_work
model.build(train_data, stop_batch)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/train/trainer.py", line 285, in build
self.descrpt.enable_compression(self.model_param['compress']["min_nbor_dist"], self.model_param['compress']['model_file'], self.model_param['compress']['table_config'][0], self.model_param['compress']['table_config'][1], self.model_param['compress']['table_config'][2], self.model_param['compress']['table_config'][3])
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/descriptor/se_a.py", line 335, in enable_compression
= self.table.build(min_nbor_dist,
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/utils/tabulate.py", line 148, in build
vv, dd, d2 = self._make_data(xx, ii)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/utils/tabulate.py", line 236, in _make_data
tt, zz = self._layer_1(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/deepmd_kit-2.0.2-py3.8-linux-x86_64.egg/deepmd/utils/tabulate.py", line 255, in _layer_1
return t, self.activation_fn(tf.matmul(x, w) + b) + t
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1180, in binary_op_wrapper
raise e
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1164, in binary_op_wrapper
return func(x, y, name=name)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1486, in _add_dispatch
return gen_math_ops.add_v2(x, y, name=name)
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 481, in add_v2
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
ret = Operation(
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2015, in init
self._c_op = _create_c_op(self._graph, node_def, inputs,
File "/home/grad/ssrokyz/.conda/envs/tf24/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1856, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 50 and 200 for '{{node add_4}} = AddV2[T=DT_DOUBLE](Tanh_1, concat)' with input shapes: [1501,50], [1501,200].
`

The text was updated successfully, but these errors were encountered:

njzjz · 2021-12-23T22:51:24Z

Currently only [25, 50, 100] is supported.

wanghan-iapcm · 2021-12-24T00:03:33Z

We only support the case neuron[i+1] == neuron[i] * 2, neuron[i] can be any number

ssrokyz · 2021-12-24T05:53:00Z

Thanks for the immediate and clear answer!

njzjz · 2022-01-16T06:43:07Z

Feature request in #1422.

Fix deepmodeling#1370. Signed-off-by: Jinzhe Zeng <[email protected]>

Fix #1370. Support compressing any neuron structure; fix compression error when `neuron[0]` is 1 (although no one will set `neuron[0]` to 1). Signed-off-by: Jinzhe Zeng <[email protected]>

ssrokyz added the bug label Dec 22, 2021

ssrokyz closed this as completed Dec 24, 2021

njzjz added duplicate and removed bug labels Jan 2, 2022

njzjz added enhancement and removed duplicate labels Jan 16, 2022

njzjz reopened this Jan 16, 2022

njzjz added a commit to njzjz/deepmd-kit that referenced this issue Oct 18, 2023

support compressing any neuron structure

d9a68c0

Fix deepmodeling#1370. Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz mentioned this issue Oct 18, 2023

support compressing any neuron structure #2933

Merged

njzjz linked a pull request Oct 18, 2023 that will close this issue

support compressing any neuron structure #2933

Merged

wanghan-iapcm closed this as completed Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Model compression fails for some descriptor neuron structures #1370

[BUG] Model compression fails for some descriptor neuron structures #1370

ssrokyz commented Dec 22, 2021 •

edited

Loading

njzjz commented Dec 23, 2021

wanghan-iapcm commented Dec 24, 2021

ssrokyz commented Dec 24, 2021 •

edited

Loading

njzjz commented Jan 16, 2022

[BUG] Model compression fails for some descriptor neuron structures #1370

[BUG] Model compression fails for some descriptor neuron structures #1370

Comments

ssrokyz commented Dec 22, 2021 • edited Loading

njzjz commented Dec 23, 2021

wanghan-iapcm commented Dec 24, 2021

ssrokyz commented Dec 24, 2021 • edited Loading

njzjz commented Jan 16, 2022

ssrokyz commented Dec 22, 2021 •

edited

Loading

ssrokyz commented Dec 24, 2021 •

edited

Loading