Change gqa to use repeat instead of concatenate #443

angeloskath · 2024-02-15T01:37:27Z

Simplify repeat and provide ~1.5 tps for quantized mistral 7B.

awni

Simpler and faster, very nice!

I aso see the perils here of having a different implementation file for each model

Change gqa to use repeat instead of concatenate

d1d045b

awni approved these changes Feb 15, 2024

View reviewed changes

awni merged commit f71e965 into main Feb 15, 2024

awni deleted the repeat-gqa branch February 15, 2024 01:40

mzbac mentioned this pull request Mar 3, 2024

Add Starcoder 2 #502

Merged

Blaizzy mentioned this pull request Mar 3, 2024

Starcoder2: Update config and change GQA to use repeat #520

Merged

Blaizzy pushed a commit to Blaizzy/mlx-examples that referenced this pull request Mar 13, 2024

Change gqa to use repeat instead of concatenate (ml-explore#443)

9edb7ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change gqa to use repeat instead of concatenate #443

Change gqa to use repeat instead of concatenate #443

angeloskath commented Feb 15, 2024

awni left a comment

Change gqa to use repeat instead of concatenate #443

Change gqa to use repeat instead of concatenate #443

Conversation

angeloskath commented Feb 15, 2024

awni left a comment

Choose a reason for hiding this comment