You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I train this, the model learns to always produce a 10-token string:
I originally noticed this when I changed the code to only produce 1 or 2-token sequences, and there it also would only ever produce 2-token sequences after training. I suspect there is some issue with masking or maybe with the loss function, but I haven't figured it out yet.
FWIW, the loss never gets extremely low (~1e-5) like it does if you only train with 10-token sequences, but reaches about 0.5.
The text was updated successfully, but these errors were encountered:
Thank you for this great package!
I tried modifying the example "Copy Task" code to have a 50% chance of producing a 9-token string and otherwise produce a 10-token string:
When I train this, the model learns to always produce a 10-token string:
I originally noticed this when I changed the code to only produce 1 or 2-token sequences, and there it also would only ever produce 2-token sequences after training. I suspect there is some issue with masking or maybe with the loss function, but I haven't figured it out yet.
FWIW, the loss never gets extremely low (~1e-5) like it does if you only train with 10-token sequences, but reaches about 0.5.
The text was updated successfully, but these errors were encountered: