Rematerialization Explaination Help #612

zhiqi-0 · 2022-07-18T02:36:43Z

Hi, thanks for your great project!

I'm curious about the rematerialization (remat) mechanism in the project, could you help explain how this will work when the option turns on? For example, will the mechanism follow with the Megatron that all operators will be re-computed during backward to save memory?

ZYHowell · 2022-07-18T16:24:55Z

If turned on, Alpa does remat for each 'layer': at here, it wraps all JaxprEqns of the layer's forward by JAX's remat_call_p. When JAX is tracing the computation and meets remat_call_p, it will automatically generate a remat part at the backward computation. Since each equation belongs to a 'layer', the answer to

will the mechanism follow with the Megatron that all operators will be re-computed during backward

is yes.

'layer' can be manually assigned or automatically generated. If you want to manually assign layers, you just need to add mark_pipeline_boundary() between two layers, then set layer_option of PipeshardParallel to ManualLayerOption whose remat_layer=True. Otherwise, you set the layer_option to AutoLayerOption to let Alpa use its layer cluster algorithm to slice the computation into layers and do remat accordingly.

There are some issues related to remat+rng and they are WIP: #535 and #592

zhiqi-0 · 2022-07-19T04:15:12Z

Thanks, this helped a lot!

zhiqi-0 closed this as completed Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rematerialization Explaination Help #612

Rematerialization Explaination Help #612

zhiqi-0 commented Jul 18, 2022 •

edited

Loading

ZYHowell commented Jul 18, 2022

zhiqi-0 commented Jul 19, 2022

Rematerialization Explaination Help #612

Rematerialization Explaination Help #612

Comments

zhiqi-0 commented Jul 18, 2022 • edited Loading

ZYHowell commented Jul 18, 2022

zhiqi-0 commented Jul 19, 2022

zhiqi-0 commented Jul 18, 2022 •

edited

Loading