-
@a-kore
And here's the code that calls this function in its call stack:
Please let me know if I'm missing something. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Each slice should have at least an Also, the method results_female_flat = flatten_results_dict(
results=results_female,
model_name=model_name,
)
# ruff: noqa: W505
for name, metric in results_female_flat.items():
split, name = name.split("/") # noqa: PLW2901
descriptions = {
"BinaryPrecision": "The proportion of predicted positive instances that are correctly predicted.",
"BinaryRecall": "The proportion of actual positive instances that are correctly predicted. Also known as recall or true positive rate.",
"BinaryAccuracy": "The proportion of all instances that are correctly predicted.",
"BinaryAUROC": "The area under the receiver operating characteristic curve (AUROC) is a measure of the performance of a binary classification model.",
"BinaryAveragePrecision": "The area under the precision-recall curve (AUPRC) is a measure of the performance of a binary classification model.",
"BinaryF1Score": "The harmonic mean of precision and recall.",
}
report.log_quantitative_analysis(
"performance",
name=name,
value=metric.tolist(),
description=descriptions[name],
metric_slice=split,
pass_fail_thresholds=0.7,
pass_fail_threshold_fns=lambda x, threshold: bool(x >= threshold),
) |
Beta Was this translation helpful? Give feedback.
-
Another thing that I encountered during testing is that when I call
This is what
|
Beta Was this translation helpful? Give feedback.
-
Yeah, It looks like logging just one metric is an edge case that would fail. I think it has to do with the |
Beta Was this translation helpful? Give feedback.
Each slice should have at least an
overall
split to specify that it hasn't been sliced.Also, the method
log_quantitative_analysis
is used once per metric. So only one pass/fail threshold can be specified for it. The notebooks for the use-cases are a useful demo for the function. Here's a snippet from theheart_failure_prediction.ipynb
notebook that shows how the metrics from the evaluator are logged: