-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize quantized masked fill #162
Conversation
Code Metrics Report─────────────────────────────────────────────────────────────────────────────── Language Files Lines Blanks Comments Code Complexity ─────────────────────────────────────────────────────────────────────────────── Rust 60 19995 1439 821 17735 1128 ─────────────────────────────────────────────────────────────────────────────── Total 60 19995 1439 821 17735 1128 ─────────────────────────────────────────────────────────────────────────────── Estimated Cost to Develop 53,182 Estimated Schedule Effort 10.982569 months Estimated People Required 4.474866 ─────────────────────────────────────────────────────────────────────────────── Processed 676190 bytes, 0.676 megabytes (SI) ─────────────────────────────────────────────────────────────────────────────── |
It reduces completion speed too much. I wonder if that's because of bad measurements or real. I'm profiling it. |
It was a measurement issue, fixed by #163 |
@lucasavila00, I just merged #163. Can you please update the benchmarks? |
@EricLBuehler I just did. I'm having a hard time measuring small changes, significantly. I ran it a bunch of times and completion speed is unchanged. Prompt speed improved. I think we need something like https://github.com/ggerganov/llama.cpp/blob/master/examples/llama-bench/README.md instead of the --prompt setup to benchmark. I'm also using my local GPU, with bad cooling and other processes using it etc The llama-bench setup runs a but of repetitions to remove this noise I created an issue for it #164 |
That is a great idea, I think that we should add that in light of future improvements. I'll merge this as I think the performance gains are very significant! |
Closes #161
This MR:
Master: