-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Represent relative missingness across time for many variables #254
Comments
Thank you so much for this! I think it is roughly the same as the Although, your code was substantially faster than mine! So this reminds me to take a look at how I've implemented this as there are some speed gains to be had here! library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(tidyr)
library(naniar)
who_na_counts <- who %>%
group_by(year) %>%
summarise_at(vars(-c(country:iso3)), ~ sum(is.na(.x))) %>%
ungroup() %>%
pivot_longer(
-year,
names_to = "variable",
values_to = "n_country_missing"
)
who_na_counts
#> # A tibble: 1,904 x 3
#> year variable n_country_missing
#> <int> <chr> <int>
#> 1 1980 new_sp_m014 210
#> 2 1980 new_sp_m1524 210
#> 3 1980 new_sp_m2534 210
#> 4 1980 new_sp_m3544 210
#> 5 1980 new_sp_m4554 210
#> 6 1980 new_sp_m5564 210
#> 7 1980 new_sp_m65 210
#> 8 1980 new_sp_f014 210
#> 9 1980 new_sp_f1524 210
#> 10 1980 new_sp_f2534 210
#> # … with 1,894 more rows
naniar_who_na_counts <- who %>%
group_by(year) %>%
miss_var_summary()
naniar_who_na_counts
#> # A tibble: 2,006 x 4
#> # Groups: year [34]
#> year variable n_miss pct_miss
#> <int> <chr> <int> <dbl>
#> 1 1980 new_sn_m014 212 100
#> 2 1980 new_sn_m1524 212 100
#> 3 1980 new_sn_m2534 212 100
#> 4 1980 new_sn_m3544 212 100
#> 5 1980 new_sn_m4554 212 100
#> 6 1980 new_sn_m5564 212 100
#> 7 1980 new_sn_m65 212 100
#> 8 1980 new_sn_f014 212 100
#> 9 1980 new_sn_f1524 212 100
#> 10 1980 new_sn_f2534 212 100
#> # … with 1,996 more rows
gg_miss_fct(who, year) Created on 2020-05-08 by the reprex package (v0.3.0) A few differences:
In terms of additions to Let me know what you think, happy to try and improve Thanks again! |
Oh I hadn't thought about putting Regarding Although it wasn't very clear from my original message, I guess what I was trying to say is that it would be nice to see more visualisation options for longitudinal data in (P.S. Glad to know my code helped!) |
Hi! I leave below an of a type of plot that I have used often when exploring the missingness pattern of longitudinal data. Basically, it gives a 'big picture' overview of the relative number of missing values per panel (in this case, country) across time and for many (or all) variables. It can be read 'horizontally' to identify the overall time frame where most variable information is available, or 'vertically' to identify variables with little information across time.
I don't know if my use case is very common, but it is somewhat related to #188 and I thought I might share it in case might inspire a future feature. It is also similar to gg_miss_fct() in that it uses a
fill
geom to represent relative missingness.Created on 2020-04-30 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: