Jax + tpu and AQT int8 train model loss is abnormal #71

Lisennlp · 2024-03-04T09:00:31Z

I used the aqt_einsum function in the code to only quantify the qk sccore, and then trained the model. However, I found that the loss dropped very slowly after training to a certain number of steps (such as 200 steps), which was quite different from the loss curve trained by bfloat16. Am I missing something? For example, does backward need some additional processing?
ps: I train model on jax==0.4.23 and tpu v5p-8

In other words, is there a training example for AQT int8 in pax?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jax + tpu and AQT int8 train model loss is abnormal #71

Jax + tpu and AQT int8 train model loss is abnormal #71

Lisennlp commented Mar 4, 2024 •

edited

Loading

Jax + tpu and AQT int8 train model loss is abnormal #71

Jax + tpu and AQT int8 train model loss is abnormal #71

Comments

Lisennlp commented Mar 4, 2024 • edited Loading

Lisennlp commented Mar 4, 2024 •

edited

Loading