You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very low-priority for now unless we run into it testing epiprocess and epipredict on COVID-19 data. While it seemed important for NoroSTAT acquisition in delphi-epidata due to the format of the web site involved, it may not matter for most prepared epi data sets.
Currently, we partially treat snapshot-row/measurement nonexistence and NA-ness as different; for example, we can produce snapshots with rows that have all NA measurements. However, epi_archives sometimes mix up row/measurement nonexistence and NA-ness. E.g.:
There's no built-in way to express row/measurement removals. (Before the initial version of a data set row, as_of will not produce a row for that measurement, but if it is removed in some version, as_of will produce NAs for that version until the next version it is added back, if any.)
Merges can "create" NA measurements for earlier time values and versions of one signal due to having update data there for other signals. This doesn't make sense if we treat nonexistence and NA-ness as different, but would if we treat them as the same.
On a related note, in development of compactification in #97 and #101, the following situation was discussed: the update data contains a measurement with an initial value of NA; should this row be omitted? If it is omitted, then (unless a merge reintroduces an NA for that measurement) it will be treated as nonexistent, and the user may get errors trying to get measurements as of some version that they expect to exist.
A couple of potential approaches to tracking separately:
Augment archive update data with a logical column for row/measurement existence flags. (SQL-based version here.)
Have a separate tables/archives tracking removal of measurements (and/or other objects to specify some nice existence pattern + a removal table/archive for exceptions). This should be more space-efficient as we don't need to add any more columns to the original archive.
This would complicate ways to create archives + existing operations.
Merges would need to deal with removal tables (e.g., store a list of removal tables, or merge them).
Instead, we might think to unify nonexistence and NA-ness in some sense, which might simplify reasoning for various functions dealing with archives (e.g., #88). We'd need NA to represent both, but still might need some idea of "nonexistence" to catch requests for unrecognized geo&additkey values, or when merging data sets with mismatched existing geo&additkey sets.
The text was updated successfully, but these errors were encountered:
Very low-priority for now unless we run into it testing epiprocess and epipredict on COVID-19 data. While it seemed important for NoroSTAT acquisition in delphi-epidata due to the format of the web site involved, it may not matter for most prepared epi data sets.
Currently, we partially treat snapshot-row/measurement nonexistence and NA-ness as different; for example, we can produce snapshots with rows that have all NA measurements. However,
epi_archive
s sometimes mix up row/measurement nonexistence and NA-ness. E.g.:as_of
will not produce a row for that measurement, but if it is removed in some version,as_of
will produce NAs for that version until the next version it is added back, if any.)On a related note, in development of compactification in #97 and #101, the following situation was discussed: the update data contains a measurement with an initial value of NA; should this row be omitted? If it is omitted, then (unless a merge reintroduces an NA for that measurement) it will be treated as nonexistent, and the user may get errors trying to get measurements as of some version that they expect to exist.
A couple of potential approaches to tracking separately:
This would complicate ways to create archives + existing operations.
epi_archive
, valid ops for & performance improvements from nonunique keys #89 if implemented.Instead, we might think to unify nonexistence and NA-ness in some sense, which might simplify reasoning for various functions dealing with archives (e.g., #88). We'd need NA to represent both, but still might need some idea of "nonexistence" to catch requests for unrecognized geo&additkey values, or when merging data sets with mismatched existing geo&additkey sets.
The text was updated successfully, but these errors were encountered: