What is the use of "AllReduce"? #21

yyk-wew · 2023-03-09T15:04:30Z

Hello. Thank you for your great work!

I have some questions about the "AllReduce" class defined here.

msn/src/utils.py

Lines 226 to 241 in 4388dc1

class AllReduce(torch.autograd.Function):

@staticmethod

def forward(ctx, x):

if (

dist.is_available()

and dist.is_initialized()

and (dist.get_world_size() > 1)

):

x = x.contiguous() / dist.get_world_size()

dist.all_reduce(x)

return x

@staticmethod

def backward(ctx, grads):

return grads

And it is used in gathering probs when computing me-max regularization.

msn/src/losses.py

Lines 70 to 72 in 4388dc1

if me_max:

avg_probs = AllReduce.apply(torch.mean(probs, dim=0))

rloss = - torch.sum(torch.log(avg_probs**(-avg_probs))) + math.log(float(len(avg_probs)))

I wonder why not use "dist.all_reduce(x)" directly. It seems that using "AllReduce" multiply the gradient by "world_size" times.
I want to know whether i am correct and why this makes sense.

Thx!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the use of "AllReduce"? #21

What is the use of "AllReduce"? #21

yyk-wew commented Mar 9, 2023

What is the use of "AllReduce"? #21

What is the use of "AllReduce"? #21

Comments

yyk-wew commented Mar 9, 2023