Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

Closed
Yi-FanLi opened this issue May 11, 2024 · 1 comment

Comments

@Yi-FanLi
Copy link
Collaborator

Yi-FanLi commented May 11, 2024

Bug summary

The tensorflow backend allows using list to specify "batch_size". However, this seems not allowed in the pytorch backend. Should they behave similarly in this behavior?

DeePMD-kit Version

3.0.0a0

Backend and its version

PyTorch v2.0.0.post200-gc263bd43e8e

How did you download the software?

docker

Input Files, Running Commands, Error Log, etc.

The part that matters in input.json:

    "training_data": {
        "systems": [
            "O64H128"
        ],
        "batch_size": [
            1
        ]
    },

Error log:

[2024-05-11 04:57:30,934] DEEPMD INFO --------------------------------------------------------------------------
/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/utils/compat.py:362: UserWarning: The argument training->numb_test has been deprecated since v2.0.0. Use training->validation_data->batch_size instead.
warnings.warn(
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/main.py", line 805, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 308, in main
train(FLAGS)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 270, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
) = prepare_trainer_input_single(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 149, in prepare_trainer_input_single
train_data_single = DpLoaderSet(
^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/pt/utils/dataloader.py", line 116, in init
system_dataloader = DataLoader(
^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 357, in init
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/torch/utils/data/sampler.py", line 232, in init
raise ValueError("batch_size should be a positive integer value, "
ValueError: batch_size should be a positive integer value, but got batch_size=[1]

Steps to Reproduce

See the tarbal.

Further Information, Files, and Links

issue_batch_size.tar.gz

@njzjz
Copy link
Member

njzjz commented May 11, 2024

Duplicate of #3475

@njzjz njzjz marked this as a duplicate of #3475 May 11, 2024
@njzjz njzjz closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants