Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft F1 Score - Incorrect result due to row order not getting preserved #15

Open
shravankshenoy opened this issue Sep 14, 2024 · 0 comments

Comments

@shravankshenoy
Copy link

shravankshenoy commented Sep 14, 2024

I recently tried using the Soft F1 Score on a custom data and was getting incorrect result due to row order not getting preserved, as shown in the code sample below

def calculate_row_match(predicted_row, ground_truth_row):
    total_columns = len(ground_truth_row)
    matches = 0
    element_in_pred_only = 0
    element_in_truth_only = 0
    for pred_val in predicted_row:
        if pred_val in ground_truth_row:
            matches += 1
        else:
            element_in_pred_only += 1
    for truth_val in ground_truth_row:
        if truth_val not in predicted_row:
            element_in_truth_only += 1
    match_percentage = matches / total_columns
    pred_only_percentage = element_in_pred_only / total_columns
    truth_only_percentage = element_in_truth_only / total_columns
    return match_percentage, pred_only_percentage, truth_only_percentage



def calculate_f1_score(predicted, ground_truth):
    # if both predicted and ground_truth are empty, return 1.0 for f1_score
    if not predicted and not ground_truth:
        return 1.0

    # Drop duplicates
    predicted_set = set(predicted) if predicted else set()
    ground_truth_set = set(ground_truth)

    # convert back to list
    predicted = list(predicted_set)
    ground_truth = list(ground_truth_set)

    # Calculate matching scores for each possible pair
    match_scores = []
    pred_only_scores = []
    truth_only_scores = []
    for i, gt_row in enumerate(ground_truth):
        # rows only in the ground truth results
        if i >= len(predicted):
            match_scores.append(0)
            truth_only_scores.append(1)
            continue
        pred_row = predicted[i]
        match_score, pred_only_score, truth_only_score = calculate_row_match(
            pred_row, gt_row
        )
        match_scores.append(match_score)
        pred_only_scores.append(pred_only_score)
        truth_only_scores.append(truth_only_score)

    # rows only in the predicted results
    for i in range(len(predicted) - len(ground_truth)):
        match_scores.append(0)
        pred_only_scores.append(1)
        truth_only_scores.append(0)

    tp = sum(match_scores)
    fp = sum(pred_only_scores)
    fn = sum(truth_only_scores)

    precision = tp / (tp + fp) if tp + fp > 0 else 0
    recall = tp / (tp + fn) if tp + fn > 0 else 0

    f1_score = (
        2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
    )
    return f1_score


print(calculate_f1_score([('apples', 325), ('banana', 191)], [(325, 'apples'), (191, 'banana')]))
# Ouput : 0

The F1 score is zero although it is supposed to be 1.

This is happening because set(predicted) and set(ground_truth) is causing the rows to get jumbled up as sets do not preserve any order (hence it is searching for 'apples' in banana row and vice versa)

The potential solution would be to use dict.fromkeys() instead of set() to preserve row order in the calculate_f1_score function as shown below

predicted = list(dict.fromkeys(predicted))
ground_truth = list(dict.fromkeys(ground_truth))

With the above modified code, we get the right output for the same example

print(calculate_f1_score([('apples', 325), ('banana', 191)], [(325, 'apples'), (191, 'banana')]))
# Ouput : 1

Curious to know your thoughts on this. Also thanks for creating this amazing dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant