Soft F1 Score - Incorrect result due to row order not getting preserved #15

shravankshenoy · 2024-09-14T07:15:42Z

I recently tried using the Soft F1 Score on a custom data and was getting incorrect result due to row order not getting preserved, as shown in the code sample below

def calculate_row_match(predicted_row, ground_truth_row):
    total_columns = len(ground_truth_row)
    matches = 0
    element_in_pred_only = 0
    element_in_truth_only = 0
    for pred_val in predicted_row:
        if pred_val in ground_truth_row:
            matches += 1
        else:
            element_in_pred_only += 1
    for truth_val in ground_truth_row:
        if truth_val not in predicted_row:
            element_in_truth_only += 1
    match_percentage = matches / total_columns
    pred_only_percentage = element_in_pred_only / total_columns
    truth_only_percentage = element_in_truth_only / total_columns
    return match_percentage, pred_only_percentage, truth_only_percentage



def calculate_f1_score(predicted, ground_truth):
    # if both predicted and ground_truth are empty, return 1.0 for f1_score
    if not predicted and not ground_truth:
        return 1.0

    # Drop duplicates
    predicted_set = set(predicted) if predicted else set()
    ground_truth_set = set(ground_truth)

    # convert back to list
    predicted = list(predicted_set)
    ground_truth = list(ground_truth_set)

    # Calculate matching scores for each possible pair
    match_scores = []
    pred_only_scores = []
    truth_only_scores = []
    for i, gt_row in enumerate(ground_truth):
        # rows only in the ground truth results
        if i >= len(predicted):
            match_scores.append(0)
            truth_only_scores.append(1)
            continue
        pred_row = predicted[i]
        match_score, pred_only_score, truth_only_score = calculate_row_match(
            pred_row, gt_row
        )
        match_scores.append(match_score)
        pred_only_scores.append(pred_only_score)
        truth_only_scores.append(truth_only_score)

    # rows only in the predicted results
    for i in range(len(predicted) - len(ground_truth)):
        match_scores.append(0)
        pred_only_scores.append(1)
        truth_only_scores.append(0)

    tp = sum(match_scores)
    fp = sum(pred_only_scores)
    fn = sum(truth_only_scores)

    precision = tp / (tp + fp) if tp + fp > 0 else 0
    recall = tp / (tp + fn) if tp + fn > 0 else 0

    f1_score = (
        2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
    )
    return f1_score


print(calculate_f1_score([('apples', 325), ('banana', 191)], [(325, 'apples'), (191, 'banana')]))
# Ouput : 0

The F1 score is zero although it is supposed to be 1.

This is happening because set(predicted) and set(ground_truth) is causing the rows to get jumbled up as sets do not preserve any order (hence it is searching for 'apples' in banana row and vice versa)

The potential solution would be to use dict.fromkeys() instead of set() to preserve row order in the calculate_f1_score function as shown below

predicted = list(dict.fromkeys(predicted))
ground_truth = list(dict.fromkeys(ground_truth))

With the above modified code, we get the right output for the same example

print(calculate_f1_score([('apples', 325), ('banana', 191)], [(325, 'apples'), (191, 'banana')]))
# Ouput : 1

Curious to know your thoughts on this. Also thanks for creating this amazing dataset

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft F1 Score - Incorrect result due to row order not getting preserved #15

Soft F1 Score - Incorrect result due to row order not getting preserved #15

shravankshenoy commented Sep 14, 2024 •

edited

Loading

Soft F1 Score - Incorrect result due to row order not getting preserved #15

Soft F1 Score - Incorrect result due to row order not getting preserved #15

Comments

shravankshenoy commented Sep 14, 2024 • edited Loading

shravankshenoy commented Sep 14, 2024 •

edited

Loading