Problems sharding Llama-70B on TPU v3-32 #22

divyapatel4 · 2024-01-05T09:46:42Z

Hi Ayaka,
We are currently utilizing your Llama-70B implementation for generation on Cloud TPUs and have encountered a few challenges that we hope you might be able to assist us with. We experienced memory issues when attempting to convert the model to JAX format on the Cloud TPUs as that ran out of memory while converting. We managed to convert the model using a swap memory of 400GB through an attached disk (SSD). We are attaching a disk with the pre-converted model to all the hosts in TPU v3-32 in read-only mode.

When we tried to shard the 70B model across the TPUs, we ran out of TPU HBM. We also noticed that when running smaller models like Llama-13B, redundant responses were generated from all four hosts in the TPU slice (TPUv3-32). We would greatly appreciate any guidance you could provide on generating with Llama-70B on TPU v3-32, or any alternative methods for generation using a single host TPUv3-8.

We would like to express our gratitude for your exceptional repository. It has significantly accelerated our research. The speed of generation on these TPUs using your implementation, compared to GPUs, is remarkable! We plan to acknowledge your valuable contribution in our upcoming paper. Thank you once again for your outstanding work.

ayaka14732 · 2024-01-05T14:54:43Z

Hi @divyapatel4, I am busy with other matters in January, so I may have little time to look into this issue. Have you tried the new Llama JAX implementation in the Hugging Face transformers library, and does that work for you?

Thank you for supporting this library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems sharding Llama-70B on TPU v3-32 #22

Problems sharding Llama-70B on TPU v3-32 #22

divyapatel4 commented Jan 5, 2024 •

edited

Loading

ayaka14732 commented Jan 5, 2024

Problems sharding Llama-70B on TPU v3-32 #22

Problems sharding Llama-70B on TPU v3-32 #22

Comments

divyapatel4 commented Jan 5, 2024 • edited Loading

ayaka14732 commented Jan 5, 2024

divyapatel4 commented Jan 5, 2024 •

edited

Loading