Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Swapped alpha and beta in tversky loss? #1993

Closed
Saransh-cpp opened this issue Jun 8, 2022 · 4 comments · Fixed by #1998
Closed

[Bug]: Swapped alpha and beta in tversky loss? #1993

Saransh-cpp opened this issue Jun 8, 2022 · 4 comments · Fixed by #1998

Comments

@Saransh-cpp
Copy link
Member

The tversky loss has 2 parameters, $\alpha$ and $\beta$, and Flux internally calculates the value of $\alpha$ as 1 - $\beta$. The loss is defined as 1 - tversky index -

$$1 - \frac{True Positives}{True Positives + \alpha * False Positives + \beta * False Negatives}$$

the Tversky index is defined as:
S(P, G; α, β) = |P G| / (|P G| + α|P \ G| + β|G \ P|) (2)
where α and β control the magnitude of penalties for FPs and FNs, respectively.

Flux implements it as -

1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 - β)*y .* (1 .- ŷ)) + 1)

Code -

num = sum(y .* ŷ) + 1
den = sum(y .* ŷ + β * (1 .- y) .* ŷ + (1 - β) * y .* (1 .- ŷ)) + 1
1 - num / den

Notice how the term (1 .- y) .* ŷ (False Positives, I hope I am not wrong) is multiplied by $\beta$, whereas it should be multiplied with $\alpha$ (which is 1 - $\beta$). Similarly, the term y .* (1 .- ŷ) is multiplied with $\alpha$ (that is 1 - $\beta$), whereas it should be multiplied with $\beta$.

This makes the loss function behave in a manner opposite to its documentation. For example -

julia> y = [0, 1, 0, 1, 1, 1];

julia> ŷ_fp = [1, 1, 1, 1, 1, 1];  # 2 false positive -> 2 wrong predictions

julia> ŷ_fnp = [1, 1, 0, 1, 1, 0];  # 1 false negative, 1 false positive -> 2 wrong predictions

julia> Flux.tversky_loss(ŷ_fnp, y)
0.19999999999999996

julia> Flux.tversky_loss(ŷ_fp, y)  # should be smaller than tversky_loss(ŷ_fnp, y), as FN is given more weight
0.21875

Here the loss for ŷ_fnp, y should have been larger than the loss for ŷ_fp, y as the loss should give more weight or penalize the False Negatives (default $\beta$ is 0.7; hence it should give more weight to FN), but the exact opposite happens.

Changing the implementation of the loss -

julia> y = [0, 1, 0, 1, 1, 1];

julia> ŷ_fp = [1, 1, 1, 1, 1, 1];  # 2 false positive -> 2 wrong predictions

julia> ŷ_fnp = [1, 1, 0, 1, 1, 0];  # 1 false negative, 1 false positive -> 2 wrong predictions

julia> Flux.tversky_loss(ŷ_fnp, y)
0.19999999999999996

julia> Flux.tversky_loss(ŷ_fp, y)  # should be smaller than tversky_loss(ŷ_fnp, y), as FN is given more weight
0.1071428571428571

which looks right.

Is this a bug, or am I missing something? Would be happy to create a PR if it is a bug!

@Saransh-cpp Saransh-cpp changed the title [Bug]: Swapped $\alpha$ and $\beta$ in tversky loss? [Bug]: Swapped alpha and beta in tversky loss? Jun 8, 2022
@ToucheSir
Copy link
Member

One easy way to check is to compare against implementations in other libraries. Are they consistent with the current behaviour, the proposed change or neither?

@darsnack darsnack moved this to Requested in Triage List Aug 16, 2022
Repository owner moved this from Requested to Done in Triage List Aug 17, 2022
@Saransh-cpp
Copy link
Member Author

Thanks for the suggestion, @ToucheSir! Rechecked the implementation of tversky_loss with the one in the paper -

Paper

def tversky(y_true, y_pred):
    y_true_pos = K.flatten(y_true)
    y_pred_pos = K.flatten(y_pred)
    true_pos = K.sum(y_true_pos * y_pred_pos)
    false_neg = K.sum(y_true_pos * (1-y_pred_pos))  # y .* (1 .- ŷ) -> FluxML
    false_pos = K.sum((1-y_true_pos)*y_pred_pos)  # (1 .- y) .* ŷ -> FluxML
    alpha = 0.3
    return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)

def tversky_loss(y_true, y_pred):
    return 1 - tversky(y_true,y_pred)

Flux

function tversky_loss(ŷ, y; β = ofeltype(ŷ, 0.7))
    _check_sizes(ŷ, y)
    #TODO add agg
    num = sum(y .* ŷ) + 1
    den = sum(y .*+ β * (1 .- y) .*+ (1 - β) * y .* (1 .- ŷ)) + 1
    1 - num / den
end

The implementation looks different because of the swapped alpha and beta values, but they act the same!

@ToucheSir
Copy link
Member

So just to clarify, using the same alpha/beta gives you the correct answer in both implementations? If there are any values of either hyperparam where they differ, I think the issue is still valid.

@Saransh-cpp
Copy link
Member Author

Using alpha=0.3 and beta=0.7 in paper's and flux's implementation works the same. Simplified expressions -

Paper

alpha = 0.3
tversky_loss = 1 - (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)

Flux

beta = 0.7
tversky_loss = 1 - (true_pos + smooth)/(true_pos + (1-beta)*false_neg + beta*false_pos + smooth)

I was looking at the wrong mathematical expression (probably wrong in the blog that I was reading) -

Wrong (what I was referring to)

$$1 - \frac{True Positives}{True Positives + \alpha * False Positives + \beta * False Negatives}$$

Correct

$$1 - \frac{True Positives}{True Positives + \alpha * False Negatives + \beta * False Positives}$$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants