You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I modify the batch_size, prefill length, and decode length in benchmark.py, the results consistently show the same VRAM usage. I expected the VRAM usage to vary under these conditions. Does anyone know why this happens? Any insights would be greatly appreciated!
Also, if I fix the batch size and change the prefill and decode length, the changes in Prefill tokens/s and Decode tokens/s are very small, which I find quite strange.
I wonder, are these two issues related?
The text was updated successfully, but these errors were encountered:
When I modify the batch_size, prefill length, and decode length in benchmark.py, the results consistently show the same VRAM usage. I expected the VRAM usage to vary under these conditions. Does anyone know why this happens? Any insights would be greatly appreciated!
Also, if I fix the batch size and change the prefill and decode length, the changes in Prefill tokens/s and Decode tokens/s are very small, which I find quite strange.
I wonder, are these two issues related?
The text was updated successfully, but these errors were encountered: