Skip to content

Patch release v4.42.3

Compare
Choose a tag to compare
@ArthurZucker ArthurZucker released this 28 Jun 15:35
· 1421 commits to main since this release

Make sure we have attention softcapping for "eager" GEMMA2 model

After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭

  • Gemma capping is a must for big models (#31698)