You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for your great work. I was able to fine-tune the 7b version on a single A100 with some great results. However, I'd like to extend the context length to 4096, which requires more GPUs. However, when I run your script, it utilizes only GPU 0 and the rest GPU 1 and 2 are not used at all. What happens then is that I get Cuda OutOfMemory error because of course it uses only one GPU when it should use all 3. I tried both DistributedDataParallel and DataParallel with no success.
Any idea how to distribute this across all 3 GPUs effectively? Thank you again! :)
The text was updated successfully, but these errors were encountered:
Hi, thank you for your great work. I was able to fine-tune the 7b version on a single A100 with some great results. However, I'd like to extend the context length to 4096, which requires more GPUs. However, when I run your script, it utilizes only GPU 0 and the rest GPU 1 and 2 are not used at all. What happens then is that I get Cuda OutOfMemory error because of course it uses only one GPU when it should use all 3. I tried both DistributedDataParallel and DataParallel with no success.
Any idea how to distribute this across all 3 GPUs effectively? Thank you again! :)
The text was updated successfully, but these errors were encountered: