-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If Alibi is on, we should turn learned_pos_emb to False #489
Conversation
…earned_pos_emb to false
Similarly the argument could be made that maybe add a warning or something??? |
Hm... Okay, so one thing is if we train models w/ ALiBi (without I also am not 100% sure what's the right thing. The "safest" thing might be to add a warning that we're changing it? Since if a user specifies ALiBi they usually mean the case where you don't add in the positional embedding. |
if we load a trained model, should the model config be part of the ckpt? In general, the change you made + a warning sounds good. |
would a good middle ground being changing the default value so that |
@vchiley It might be good to independently save it from composer, since also the composer checkpoint is unwieldy. @samhavens I agree, so I did sth like this:
And it leads to this super weird warning... ): Let me know if I'm doing something obviously wrong (I think there's something weird where in transformers where the class gets initialized twice...: And we also can't set this up in So I think unfortunately, I think I'm just going to add an additional warning, unless someone else has a way around this. |
@bcui19 it sets it to false, then sets it to true? |
I think... When you create MPTConfig the way we do it now, HF does something weird and it ends up creating two |
If HF is going to make us choose between extremely cryptic warnings like that or just "set pos_emb to false if alibi is true" then I guess we should revert to what you had before and just let the user know they specified alibi so we are setting pos emb to false. Unless, does setting the default differently solve this? |
If we use
alibi
we need to turn thelearned_pos_emb
to False. Otherwise we still go down thelearned_pos_emb
codepath.