About FedAdagrad #2

liyipeng00 · 2023-07-01T01:36:16Z

Thanks for your solid and insigntful paper.

Lines 377-381 in main.py.

if(alg=='fedadagrad'):
    delta = delta + grad_avg**2
    grad_avg = grad_avg/(torch.sqrt(delta+epsilon_fedadagrad))

Yet, the pseudocode of original paper "Adaptive Federated Optimization" is
$x_{t+1} = x_{t} + \eta_g \frac{m_t}{\sqrt{v_t}+\tau}$

So maybe torch.sqrt(delta+epsilon_fedadagrad) should be tuned into torch.sqrt(delta)+epsilon_fedadagrad

Since I am not familiar with adagrad algorithm, I am not sure about it. Can you help me with this issue kindly?

The text was updated successfully, but these errors were encountered:

liyipeng00 · 2023-07-01T02:17:22Z

And about for lines 391-392.

grad_avg_normalized = grad_avg/(0.1)
delta_normalized = delta/(0.01)

From the original paper, we do not find that we need to normalize the grad.

If we use the bias-corrected estimate in adam "Adam A method for stochastic optimization", it should be
$m_t = m_t / (1-\beta_1^t)$
$v_t = v_t / (1-\beta_2^t)$
When $t>1$, $1-\beta_1^t \neq 0.1$ (only when t=1, left=right).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About FedAdagrad #2

About FedAdagrad #2

liyipeng00 commented Jul 1, 2023

liyipeng00 commented Jul 1, 2023 •

edited

Loading

About FedAdagrad #2

About FedAdagrad #2

Comments

liyipeng00 commented Jul 1, 2023

liyipeng00 commented Jul 1, 2023 • edited Loading

liyipeng00 commented Jul 1, 2023 •

edited

Loading