My multi gpu is not getting detected? #1935
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Hey, I did a test with your config. I'm not sure if this was a typo, but your |
Beta Was this translation helpful? Give feedback.
-
Hi, I’m currently studying how to increase the context size of a model. Let me clarify my understanding: Meanwhile, the model’s maximum context size is determined by the RoPE (Rotary Position Embedding) configuration at the time of training. For example, the Qwen/Qwen2.5-1.5B model was trained with Thanks for helping me clarify! |
Beta Was this translation helpful? Give feedback.
Hey, my bad there. I fixed my message a bit on the numbers as well.
I mean,
sequence_len: 32768
is the one you should set to use the model's full context listed above. Therope_theta
param within the model's config (linked above) is what allows it to scale from 32k to its long context.i.e. set the below and your model would train fine and support 100k seq during inference :)