Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zamba #30950
Add Zamba #30950
Changes from 1 commit
7eff1cc
14961a2
b67ff24
0aa1003
5e88653
123d959
f35bdf9
1ec90d1
4d3f8c0
e51113d
cf6ee16
f80b813
c010a68
9c3abc8
5d3d615
d245749
663343d
b3540ea
554c14c
c9c97fd
ec9edd7
939d6a9
c26addd
e3c93f0
58d8c2d
396ebff
df8dfd3
4ab88a2
1a521de
767a591
d5b2beb
029813b
bec7dce
db15348
6c7f812
c3766ba
dff24b8
0e9f3c9
245d9d9
8aedd30
f8ed17a
a5d5873
17cef25
f773f12
c5852aa
da64b36
7679578
85fe7cb
6c949c6
f78b627
3b3605a
f6fc1e8
d5b8d6e
c2428a4
bbc9a8e
b13fdde
037b938
9a1ef16
d9d436c
0bbb6c9
91bc076
8f0100f
7478e25
a7c9d17
c2d097f
3788196
1c6cca8
911a78a
c6f2b3f
df93132
211a5b5
e0cb9fe
1df30bb
3d2800b
1e6f38b
d93377d
d01d80d
b9e86b0
4b0fb52
66b72c8
d527a14
97c646c
1e4ffe6
2c53db2
9a1ad32
a7717f2
0fae398
0304440
3d9ec8e
cb1d1d9
efcf16a
339d4cc
a46a26b
d0c1bc1
4fcd130
8d29964
0381c33
75554d8
a109b3f
9c10afe
1880455
daef5b0
347f761
634837f
4e8db07
1504774
06e3a7a
267530d
0a90fc7
fabaaec
75f0d89
b9545eb
cdbd690
6fabb6a
b9f6cce
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: 4b0fb52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be super similar to Gemma / can probably be copied from entirey!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is right. More in detail, we originally copied this from Jamba, and adapted it to Zamba by removing lines related to expert routers (which are not present in Zamba, there is no mixture of experts). We now slightly updated the class to reflect recent changes in upstream transformers and added
# Adapted from transformers.models.jamba.modeling_jamba.JambaForCausalLM with Jamba->Zamba, JAMBA->ZAMBA
Zyphra@91bc076#diff-0f4d89960530c068a10af906f1958ed46e3e5f2ff937d6be61517478f383b074R1349
The comment mentions Jamba instead of Gemma as there are a few differences with
GemmaForCausalLM
in theprepare_inputs_for_generation
method due to that we useHybridMambaAttentionDynamicCache
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's good enough thanks for the detailed explanation