How to use this for multiple GPU fine-tuning? #2

Dorys221 · 2023-11-22T16:57:42Z

Hi, thank you for your great work. I was able to fine-tune the 7b version on a single A100 with some great results. However, I'd like to extend the context length to 4096, which requires more GPUs. However, when I run your script, it utilizes only GPU 0 and the rest GPU 1 and 2 are not used at all. What happens then is that I get Cuda OutOfMemory error because of course it uses only one GPU when it should use all 3. I tried both DistributedDataParallel and DataParallel with no success.

Any idea how to distribute this across all 3 GPUs effectively? Thank you again! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use this for multiple GPU fine-tuning? #2

How to use this for multiple GPU fine-tuning? #2

Dorys221 commented Nov 22, 2023

How to use this for multiple GPU fine-tuning? #2

How to use this for multiple GPU fine-tuning? #2

Comments

Dorys221 commented Nov 22, 2023