`comm.py` should maybe consider backend-specific support of different devices #30

lucaslie · 2022-01-14T22:26:03Z

Depending on the backend, distributed communication may only be supported on either CPU or GPU, see table here.

Right now, in comm.py communication is always done on the GPU, see here e.g.:

Lines 32 to 34 in d3fda52

    
           # serialize 
        
           if context.rank() == src: 
        
               tensor = _serialize(obj).cuda()

I would suggest considering the backend-specific device support for both allgather() and broadcast() to ensure the functions are usable across multiple backends.

torch.distributed.broadcast_object_list and torch.distributed.all_gather_object might be a useful starting points for this.

The text was updated successfully, but these errors were encountered:

zhijian-liu self-assigned this Jan 22, 2022

zhijian-liu added the enhancement New feature or request label Jan 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`comm.py` should maybe consider backend-specific support of different devices #30

`comm.py` should maybe consider backend-specific support of different devices #30

lucaslie commented Jan 14, 2022 •

edited

Loading

comm.py should maybe consider backend-specific support of different devices #30

comm.py should maybe consider backend-specific support of different devices #30

Comments

lucaslie commented Jan 14, 2022 • edited Loading

`comm.py` should maybe consider backend-specific support of different devices #30

`comm.py` should maybe consider backend-specific support of different devices #30

lucaslie commented Jan 14, 2022 •

edited

Loading