You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These csv files should live on an orphan branch of the variant-nowcast-hub-dashboard repo named predevals/data. There should be a folder and file structure like the following:
The scores in scores/clade prop/Full season/scores.csv will be overall average scores for each model, while the scores in subfolders will be disaggregated by the corresponding variable. For example, scores in scores/clade prop/Full season/location/scores.csv will be average scores for each model and location.
For scores/clade prop/Full season/scores.csv, we expect the following columns:
"model_id": the id of the model
"energy_score": the mean energy score for that model, obtained as an average of energy scores for each combination of location/nowcast_date/target_date that was predicted by that model over the course of the season.
"se_point": the mean squared error of predictive means for that model, obtained as an average of squared errors of predictive means for each combination of location/nowcast_date/target_date that was predicted by that model over the course of the season.
"interval_coverage_50": empirical coverage rate of marginal 50% prediction intervals for each clade. These should be prediction intervals for the observed data, i.e., obtained including a multinomial sampling step.
"interval_coverage_95": empirical coverage rate of marginal 95% prediction intervals for each clade. These should be prediction intervals for the observed data, i.e., obtained including a multinomial sampling step.
"n": the number of location/nowcast_date/target_date combinations that were averaged across for this model to compute the mean score
For scores/clade prop/Full season/location/scores.csv (for example, disaggregating by location), we expect the following columns:
"model_id": the id of the model
"location": the id of the location, a state code as used by the variant nowcast hub.
"energy_score": the average energy score for that model, obtained as a mean of energy scores for each combination of location/nowcast_date/target_date that was predicted by that model over the course of the season.
"interval_coverage_50": empirical coverage rate of marginal 50% prediction intervals for each clade. These should be prediction intervals for the observed data, i.e., obtained including a multinomial sampling step.
"interval_coverage_95": empirical coverage rate of marginal 95% prediction intervals for each clade. These should be prediction intervals for the observed data, i.e., obtained including a multinomial sampling step.
"n": the number of location/nowcast_date/target_date combinations that were averaged across for this model to compute the mean score
Note: A serious limitation of the above proposal is that average scores will average across the different sets of locations and dates predicted by each model, and as a result they will not be truly comparable across models. The way that we've most often handled this in the past is to calculate relative scores using the procedure outlined here: https://epiforecasts.io/scoringutils/reference/get_pairwise_comparisons.html. Unfortunately, I think it would be challenging for us to actually use that function as the scoringutils package uses quite a specific representation of scores data behind the scenes. But it would be good to eventually add in some approach to handling comparison of models that have submitted predictions for different subsets of locations and dates.
It likely makes sense to tackle this issue in stages, e.g. with the following steps (or whatever other stepped approach makes sense to the person doing this work):
Add only the overall average energy scores, not including squared errors or interval coverage rates and not broken down by location, nowcast_date, target_date, or horizon.
Add in scores disaggregated by location, nowcast_date, target_date, and horizon.
Add squared error of point predictions
Add interval coverage rates
Add relative skill scores.
The text was updated successfully, but these errors were encountered:
This all seems like a good breakdown to me. One question that might be a more general hubdashboard kind of question. Are we locked into the "Full season" name? Or could we name it something else? Asking because the idea of a "season" is maybe not as relevant here, where we expect this to go on in and out of the regular respiratory virus season.
These csv files should live on an orphan branch of the
variant-nowcast-hub-dashboard
repo namedpredevals/data
. There should be a folder and file structure like the following:The scores in
scores/clade prop/Full season/scores.csv
will be overall average scores for each model, while the scores in subfolders will be disaggregated by the corresponding variable. For example, scores inscores/clade prop/Full season/location/scores.csv
will be average scores for each model and location.For
scores/clade prop/Full season/scores.csv
, we expect the following columns:For
scores/clade prop/Full season/location/scores.csv
(for example, disaggregating by location), we expect the following columns:Note: A serious limitation of the above proposal is that average scores will average across the different sets of locations and dates predicted by each model, and as a result they will not be truly comparable across models. The way that we've most often handled this in the past is to calculate relative scores using the procedure outlined here: https://epiforecasts.io/scoringutils/reference/get_pairwise_comparisons.html. Unfortunately, I think it would be challenging for us to actually use that function as the
scoringutils
package uses quite a specific representation of scores data behind the scenes. But it would be good to eventually add in some approach to handling comparison of models that have submitted predictions for different subsets of locations and dates.It likely makes sense to tackle this issue in stages, e.g. with the following steps (or whatever other stepped approach makes sense to the person doing this work):
The text was updated successfully, but these errors were encountered: