Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same Memory (VRAM) with different batch_size, Prefill Length, Decode Length. #691

Open
rayzr0123 opened this issue Jan 15, 2025 · 0 comments

Comments

@rayzr0123
Copy link

When I modify the batch_size, prefill length, and decode length in benchmark.py, the results consistently show the same VRAM usage. I expected the VRAM usage to vary under these conditions. Does anyone know why this happens? Any insights would be greatly appreciated!

Also, if I fix the batch size and change the prefill and decode length, the changes in Prefill tokens/s and Decode tokens/s are very small, which I find quite strange.

I wonder, are these two issues related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant