You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently tried using the Soft F1 Score on a custom data and was getting incorrect result due to row order not getting preserved, as shown in the code sample below
def calculate_row_match(predicted_row, ground_truth_row):
total_columns = len(ground_truth_row)
matches = 0
element_in_pred_only = 0
element_in_truth_only = 0
for pred_val in predicted_row:
if pred_val in ground_truth_row:
matches += 1
else:
element_in_pred_only += 1
for truth_val in ground_truth_row:
if truth_val not in predicted_row:
element_in_truth_only += 1
match_percentage = matches / total_columns
pred_only_percentage = element_in_pred_only / total_columns
truth_only_percentage = element_in_truth_only / total_columns
return match_percentage, pred_only_percentage, truth_only_percentage
def calculate_f1_score(predicted, ground_truth):
# if both predicted and ground_truth are empty, return 1.0 for f1_score
if not predicted and not ground_truth:
return 1.0
# Drop duplicates
predicted_set = set(predicted) if predicted else set()
ground_truth_set = set(ground_truth)
# convert back to list
predicted = list(predicted_set)
ground_truth = list(ground_truth_set)
# Calculate matching scores for each possible pair
match_scores = []
pred_only_scores = []
truth_only_scores = []
for i, gt_row in enumerate(ground_truth):
# rows only in the ground truth results
if i >= len(predicted):
match_scores.append(0)
truth_only_scores.append(1)
continue
pred_row = predicted[i]
match_score, pred_only_score, truth_only_score = calculate_row_match(
pred_row, gt_row
)
match_scores.append(match_score)
pred_only_scores.append(pred_only_score)
truth_only_scores.append(truth_only_score)
# rows only in the predicted results
for i in range(len(predicted) - len(ground_truth)):
match_scores.append(0)
pred_only_scores.append(1)
truth_only_scores.append(0)
tp = sum(match_scores)
fp = sum(pred_only_scores)
fn = sum(truth_only_scores)
precision = tp / (tp + fp) if tp + fp > 0 else 0
recall = tp / (tp + fn) if tp + fn > 0 else 0
f1_score = (
2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
)
return f1_score
print(calculate_f1_score([('apples', 325), ('banana', 191)], [(325, 'apples'), (191, 'banana')]))
# Ouput : 0
The F1 score is zero although it is supposed to be 1.
This is happening because set(predicted) and set(ground_truth) is causing the rows to get jumbled up as sets do not preserve any order (hence it is searching for 'apples' in banana row and vice versa)
The potential solution would be to use dict.fromkeys() instead of set() to preserve row order in the calculate_f1_score function as shown below
I recently tried using the Soft F1 Score on a custom data and was getting incorrect result due to row order not getting preserved, as shown in the code sample below
The F1 score is zero although it is supposed to be 1.
This is happening because
set(predicted)
andset(ground_truth)
is causing the rows to get jumbled up as sets do not preserve any order (hence it is searching for 'apples' in banana row and vice versa)The potential solution would be to use
dict.fromkeys()
instead ofset()
to preserve row order in thecalculate_f1_score
function as shown belowWith the above modified code, we get the right output for the same example
Curious to know your thoughts on this. Also thanks for creating this amazing dataset
The text was updated successfully, but these errors were encountered: