You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.
I don't think we have it on the current roadmap, Andrej can chime in. We have a lot of stuff on the backlog before we get here, including potentially supporting fp8, ZeRO stage 2, etc.
The problem with BitNet (b1.58) training is that is still uses FP16/BF16 for training so the memory consumption does not decrease. Anyways getting support for it would be great! If used with FP8 training it could bring improvement.
First of all, thanks. We need more ramps.
I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.
Papers
The text was updated successfully, but these errors were encountered: