Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect US Data #25

Closed
mikebarton23 opened this issue Mar 11, 2020 · 5 comments
Closed

Incorrect US Data #25

mikebarton23 opened this issue Mar 11, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@mikebarton23
Copy link

Seems like this is likely on Johns Hopkins side but I'm seeing inconsistencies in day-to-day data for the U.S. It looks like maybe they've decided to aggregate by state now in addition to by county and some of the original county data was left in. For example, "Washington" had 0 cases on 3/9/2020 but has 267 listed on 3/10/2020. However, there is still data from counties within Washington being counted. That leads to double counting in some scenarios -- the 3/10/2020 count of confirmed cases in the US comes out to 1,670 using these new numbers which is off by quite a bit.

Doubt there's anything you can do here but thought I'd bring it to your attention.

Attaching a sheet I made that shows the largest discrepancies.

JHU Data Errors.xlsx

@ExpDev07 ExpDev07 added the bug Something isn't working label Mar 11, 2020
@ralyodio
Copy link

https://lionbridge.ai/datasets/coronavirus-datasets-from-every-country/

Maybe we need a new data set since JHU is unreliable now.

@ExpDev07
Copy link
Owner

I wouldn’t exactly call them unreliable. Many major news outlets (including local and national ones in my country) are still quoting their data. There’s probably issues open right now that addresses this.

@mikebarton23
Copy link
Author

I checked out the JHU GitHub page and found an announcement about data formats moving forward: Issue #504

Essentially, they're aggregating by state now but still kept some of the city data in there which led to lots of double counting. I got around this on the historical side by taking the last two letters of the city (e.g. Columbia, SC) and joining that up to a table I created with state codes and names. Then, I aggregated all of the states' data from 3/9 moving backward in time. Helped to give some continuity to the time series data. A huge pain but thought I'd share my solution.

@ExpDev07
Copy link
Owner

Assuming that they’re still working on it, it’ll get fixed soon and show the correct data.

@mikebarton23
Copy link
Author

It didn't necessarily sound like they were planning on fixing it based on what I read. But your API is working as intended so it's nothing on your end.

Really appreciate you building out this API, by the way. It has been a huge help!

Kilo59 added a commit to codedawi/coronavirus-tracker-api that referenced this issue May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants