-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish NA (missing) from NA (accumulated) #547
Comments
Another option would be to distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs which might be easier preprocessing if potentially easier to inadvertently get wrong. |
yeah potentially but also think that could be a bit dangerous. I would instead suggest making a helper function that maps from that structure to the less dangerous explicit version for those that clearly want that. |
and if going that way I'd suggest that becomes a dependent issue |
With appropriate warnings messages as suggested in #771 then I think this is the best option as it can take all information from a 2-column data frame as before:
|
I don't think I agree. I think there should be one way of handling missing data (as missing) and it can throw a warning if creating missing dates saying what it is doing. I think overloading NAs like we have done for accumulation is confusing and dangerous and would much prefer a separate feature describing this. Something I think we want to be aware of is non-standard schemes. These could be 1. Non-constant reporting and 2. repeated reporting (some counts are reported twice as aggregates of different dates). I haven't really seen the latter and its quite an edge case so its unclear to me if we really want to support it or not. |
I'm open to suggestions and acknowledge there are dangers in overloading interpretations. My ideal would be one in which it's fairly straightforward (and safe) to handle the common cases of daily/weekly data on incidence/prevalence and missingness that could correspond to zeroes or missed reports. |
This can now be checked with the |
Do you not think this is possible without overloading using the kind of approach I have suggested with helper utilities? |
It probably is. It could relate to supporting option (1) as suggested in #346 (comment) |
I'm not sure I totally follow this now. Does the suggestion support a mixture of accumulation and missing data? |
estimate_infections would expect an accumulation column if one wanted to accumulate but a user could create this based on some interpretation of NAs using a separate function. |
and NAs would go back to working only with missing data and you wouldn't support repeated reporting? |
if yes and no sounds good |
I think this would be my preferred option as it would be more general but I also think it can be addressed in its own review as it would be a superset of this PR.
My thought on how that would work is to have a new variable (
accumulate
) that indicates which days should be summed.Originally posted by @seabbs in #534 (review)
The text was updated successfully, but these errors were encountered: