Fixed batch_size_per_device and batch_size misuse in LazyLLM #377

JingofXin · 2024-12-03T07:51:10Z

Background

This PR addresses the issue of the incorrect usage of batch_size_per_device and batch_size. In the transformers code, the total_train_batch_size is calculated as total_train_batch_size = _train_batch_size * gradient_accumulation_steps * world_size.
Here, _train_batch_size corresponds to batch_size_per_device in Llamafactory, which is similar to a micro_batch_size.

Code in Transformers:

Log in Llamafactory and Code in Transformers:

Solve

To resolve this, in LazyLLM, I have fixed gradient_accumulation_steps to 1. Given a batch_size, the correct calculation for batch_size_per_device should be batch_size_per_device // n_gpus. This ensures that the batch size is properly distributed across the available GPUs.

Verify

2 GPUs OOM:
4 GPUs OK:

Bug fixed: bs = ms * ws * ga

bf63e8b

wzh1994 merged commit b8bea00 into LazyAGI:main Dec 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed batch_size_per_device and batch_size misuse in LazyLLM #377

Fixed batch_size_per_device and batch_size misuse in LazyLLM #377

JingofXin commented Dec 3, 2024

Fixed batch_size_per_device and batch_size misuse in LazyLLM #377

Fixed batch_size_per_device and batch_size misuse in LazyLLM #377

Conversation

JingofXin commented Dec 3, 2024

Background

Solve

Verify