-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release NCHS Mortality #367
Comments
Thanks Katie -- would you like me to shepherd the sub-tasks? Do you have an example of the process I can copy? |
I would! Some examples are in the links, but this is still a prototype of the process we'd like to build, so not everything has something to go off of. The tasks are roughly in order -- eg if the statistical review doesn't go off, we don't want to move forward until that's been fixed. Jingjing should handle the statistical review. Anyone can handle putting together the signal and source naming recommendations (just send Roni a txt when he should look at the approvals doc and when it's needed by -- 1.11 isn't on the calendar yet so several days allowance is fine). Adding to automation will coordinate with Brian. Visual review, signal description pop-up text, and map release notes will coordinate with Chris. API documentation and mailing list notification will coordinate with Kari, Jingjing, and Alex. |
Fabulous thanks! @jingjtang looks like you are up for the first task. Ping me if you need help and I can escalate. |
@benjaminysmith Simple comparison between As for the correlation APP, it seems there is not a weekly response to use. A simple geo-wise spearman correlation analysis is shown below. Note that there is no lag considered(according to the figures in the pdf). The drop in the last few weeks is reasonable, since deaths usually have significant delay in reports. According to NCHS, this delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death. |
@krivard As for the signal names, the current ones are:
as described here what kind of txt do you need? |
Though there is no geo aggregation needed, but we need population info for |
Contacted @korlaxxalrok already for the automation. Will work with him once he starts on this.
|
|
|
@korlaxxalrok for automation |
API documentation PR here |
The correlation looks pretty convincing to me. @RoniRos can give final approval on this, but otherwise this looks good to go. If the names are also good I think we have everything we need aside from running it in automation. |
Visual View for all of the signals here |
@benjaminysmith I'm very sorry this has been blocked on me. I needed to find time to review the NCHS data definitions. |
Based on the NCHS explanations and our conventions, the signals need to named as follows:
By way of explanation: it turns out the classifications are _not _ based on the primary cause of death, but on any cause of death, of which there are often several (several ICD codes per individual death). So the logic has to be spelled out more clearly. Some choices can be debated. I'd love to hear your thoughts @krivard @benjaminysmith @jingjtang :
|
@jingjtang please modify the description to explicitly state that the classification is based on all the codes on the death certificate (not just the 'primary cause of death'). |
@benjaminysmith Can you please point me to the correlations? |
Ok, thanks! Then let's change to 'incidence' throughout. Yes, I saw this plot, I thought Ben was referring to additional ones. |
@jingjtang -- what do we need to do to resolve this? Do you need additional input? |
Thanks @jingjtang .
On inspecting your comparison chart (covid_deaths_num.pdf). I think I figured out the reason. There are some surprising zeros in the NCHS counts. Specifically:
So @benjaminysmith I think my question about the glitchy correlation is resolved: the problem is NCHS, not USAFacts. But this is probably something we should try to resolve. For example, why does our NCHS signal reports all zeros for AK when the CDC table shows cumulative 74 deaths? And are the prolonged zeros for HI, MT & others correspond to CDC's reporting? |
Is it likely that this is related to the file format problems with the early NCHS data files dated up to ~May? I inquired with Matthew but never got a response, and we eventually just went forward with the better file format that became available starting June. (Sparse) context in this thread. |
The weak correlations for w11-13 may be explained that way, but not the zeros for AK, WY,HI,MT. I think we need to resolve this before we publish -- we shouldn't publish zeros for AK, for example. |
@RoniRos Just fixed a bug in the pipeline that I incorrect put a one week shift previously #528. After fixing this issue:
So according to the comparison with incidence numbers in USA-Facts, we should treat their(nchs's) weekly report as incidence numbers. There are NAs existing in the raw dataset. I treat them as 0 when initially write the pipeline. But it is incorrect especially for states with a lot of missing values such as "AK", "DE", "HI", "ME", "NC", "WY" for either a certain time range or the entire published time range. I should keep those NAs, definitely. It's correct that they have 74 cases in Table 2 for AK, but it is also correct that they only have 0 or NA for AK in table 1 all the time (similar thing happens to WY, the first non-zero/non-NA value shows in W42 with week-ending date 2020-10-17). @RoniRos You can directly download the most recent raw dataset from here (click "export" and then "CSV"). So, it seems they have 74 recorded COVID related deaths until 11-14 but not sure which week they should assign those deaths to. |
I think I figured out the reason for the discrepancy: any time the value is missing (N/A), it corresponds to a positive count less than 10 (i.e., somewhere in 1-9), which is censored by privacy rules. For low-population, low-covid states like Alaska or Wyoming, this happens most of the time. For some of the other states, it happens occasionally. |
Didn't find a download link for Table2, but here is a county level table for COVID-related deaths only. According to their API docs, we should be able to specify the |
@benjaminysmith Draft for the signal description pop-up text added under |
when I looked at Socrata before, I'd forgotten about the frontend support for weeklies 😱 we may end up pushing this to 1.12 in the map, but at least we can release in the API with the current batch. |
Summarizing the last few comments:
|
Will this signal have |
@benjaminysmith Yes, since currently we don't have another data source that can handle the missing values due to privacy issue, we just want to keep them as NAN, at least for now I think @krivard . |
Regarding map support, we'd need to know an ETA of when this would be available in staging to schedule the work to pull it in. |
API support sounds like a blocker for release. Filed a request for this in cmu-delphi/covidcast#305 |
* The API server has no problem with a weekly time type, and neither do the low-level clients in |
Representing the censored values as N/A seems like the most reasonable thing to do for now. Eventually, we may want to distinguish them from truly missing. Do our tools support different types of missingness? |
Also, we need to think of how to represent on the map zero vs. censored vs. unknown. |
Verdict from leads: Go ahead with the release in the API; the client and map will catch up. @RoniRos We do not yet encode different kinds of missingness, it's on the list for December. |
@krivard This is on prod now. It should have delivered some data yesterday, but otherwise will run daily, build up its cache, and then output on Mondays. |
The API docs got lost along the way but I recovered them in cmu-delphi/delphi-epidata#315. @jingjtang would you draft a mailing list announcement and drop it in a comment here? Then @Akvannortwick can polish it up for distribution and maybe we can announce this puppy tomorrow. |
You can also draft an email here and I will make changes and suggestions. |
@krivard @Akvannortwick Except for the descriptions for all related signals, if there anything else to be added? Besides, I made some comments under cmu-delphi/delphi-epidata#315 |
Released today! |
This is a new indicator that's completed its first phase of development and is ready to consider for public release. This indicator tracks the number of covid, pneumonia, and all deaths, as well as the percentage of expected deaths, as published by the NCHS. Signal is per week, but the estimate is updated daily.
Edit: we will not pursue map release at this time
Statistical review (usually correlations)
Signal / source name review (usually Roni)
Add to Delphi Automation
API support for weekly data
Visual review
Signal description pop-up text drafting and reviewMap release notesAPI documentation and/or changelog
API mailing list notification
The text was updated successfully, but these errors were encountered: