[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

Yi-FanLi · 2024-05-11T05:09:22Z

Bug summary

The tensorflow backend allows using list to specify "batch_size". However, this seems not allowed in the pytorch backend. Should they behave similarly in this behavior?

DeePMD-kit Version

3.0.0a0

Backend and its version

PyTorch v2.0.0.post200-gc263bd43e8e

How did you download the software?

docker

Input Files, Running Commands, Error Log, etc.

The part that matters in input.json:

    "training_data": {
        "systems": [
            "O64H128"
        ],
        "batch_size": [
            1
        ]
    },

Error log:

[2024-05-11 04:57:30,934] DEEPMD INFO --------------------------------------------------------------------------
/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/utils/compat.py:362: UserWarning: The argument training->numb_test has been deprecated since v2.0.0. Use training->validation_data->batch_size instead.
warnings.warn(
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/main.py", line 805, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 308, in main
train(FLAGS)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 270, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
) = prepare_trainer_input_single(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 149, in prepare_trainer_input_single
train_data_single = DpLoaderSet(
^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/utils/dataloader.py", line 116, in init
system_dataloader = DataLoader(
^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 357, in init
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/utils/data/sampler.py", line 232, in init
raise ValueError("batch_size should be a positive integer value, "
ValueError: batch_size should be a positive integer value, but got batch_size=[1]

Steps to Reproduce

See the tarbal.

Further Information, Files, and Links

issue_batch_size.tar.gz

The text was updated successfully, but these errors were encountered:

njzjz · 2024-05-11T05:33:11Z

Duplicate of #3475

Yi-FanLi added the bug label May 11, 2024

Yi-FanLi mentioned this issue May 11, 2024

add option to select backends TF/PT deepmodeling/dpgen#1545

Merged

njzjz marked this as a duplicate of #3475 May 11, 2024

njzjz added the duplicate label May 11, 2024

njzjz closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

Yi-FanLi commented May 11, 2024 •

edited

Loading

njzjz commented May 11, 2024

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

Comments

Yi-FanLi commented May 11, 2024 • edited Loading

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

njzjz commented May 11, 2024

Yi-FanLi commented May 11, 2024 •

edited

Loading