-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Useful missing data data structure and visualisation #165
Comments
library(tidyverse)
library(naniar)
shadow_gather <- function(shadow_data){
shadow_data %>%
tidyr::gather(key = "variable",
value = "value",
-which_are_shadow(.)) %>%
tidyr::gather(key = "variable_NA",
value = "value_NA",
which_are_shadow(.))
}
ocean_imp_mean <- oceanbuoys %>%
bind_shadow(only_miss = TRUE) %>%
impute_mean_all()
gathered_ocean_imp_mean <- shadow_gather(ocean_imp_mean)
gathered_ocean_imp_mean
#> # A tibble: 17,664 x 4
#> variable value variable_NA value_NA
#> <chr> <dbl> <chr> <chr>
#> 1 year 1997 sea_temp_c_NA !NA
#> 2 year 1997 sea_temp_c_NA !NA
#> 3 year 1997 sea_temp_c_NA !NA
#> 4 year 1997 sea_temp_c_NA !NA
#> 5 year 1997 sea_temp_c_NA !NA
#> 6 year 1997 sea_temp_c_NA !NA
#> 7 year 1997 sea_temp_c_NA !NA
#> 8 year 1997 sea_temp_c_NA !NA
#> 9 year 1997 sea_temp_c_NA !NA
#> 10 year 1997 sea_temp_c_NA !NA
#> # ... with 17,654 more rows
ggplot(gathered_ocean_imp_mean,
aes(x = value,
fill = value_NA)) +
geom_histogram() +
facet_grid(variable ~ variable_NA,
scales = "free_x",
switch = "y")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Created on 2018-08-13 by the reprex package (v0.2.0). Some notes on implementationnamingThe function name should be Methods
Options for extra variablesThere should be options to leave certain variables in the dataframe untouched. For example, the Notes on the visualisation methodI spent a while trying to NOT use This smells like a bit of a leaky abstraction. There should be a nice way to get only the variables and their imputed values into shape for this kind of visualisation. This means getting the visualisations on the diagonal - doing a filter where variable == variable_NA. some work so far on this: gathered_ocean_imp_mean %>%
filter(variable %in% c("air_temp_c",
"humidity",
"sea_temp_c")) %>%
mutate(temp = paste0(variable,"_NA")) %>%
filter(variable == temp)
|
OK so here is the progress on this: library(tidyverse)
library(naniar)
ocean_imp_mean <- oceanbuoys %>%
bind_shadow(only_miss = TRUE) %>%
impute_mean_all()
gathered_ocean_imp_mean <- shadow_long(ocean_imp_mean)
gathered_ocean_imp_mean %>%
filter(variable %in% c("air_temp_c",
"humidity",
"sea_temp_c")) %>%
filter(variable_NA == paste0(variable,"_NA")) %>%
ggplot(aes(x = value,
fill = value_NA)) +
geom_histogram() +
facet_wrap(~variable_NA)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Created on 2018-08-13 by the reprex package (v0.2.0). I think that the abstraction here would be to specify the variables that you want to focus on, which would be filtered out. |
Actually I just added that filtering step to the library(tidyverse)
library(naniar)
ocean_imp_mean <- oceanbuoys %>%
bind_shadow(only_miss = TRUE) %>%
impute_mean_all()
gathered_ocean_imp_mean <- shadow_long(ocean_imp_mean)
gathered_ocean_imp_mean
#> # A tibble: 17,664 x 4
#> variable value variable_NA value_NA
#> <chr> <dbl> <chr> <chr>
#> 1 year 1997 sea_temp_c_NA !NA
#> 2 year 1997 sea_temp_c_NA !NA
#> 3 year 1997 sea_temp_c_NA !NA
#> 4 year 1997 sea_temp_c_NA !NA
#> 5 year 1997 sea_temp_c_NA !NA
#> 6 year 1997 sea_temp_c_NA !NA
#> 7 year 1997 sea_temp_c_NA !NA
#> 8 year 1997 sea_temp_c_NA !NA
#> 9 year 1997 sea_temp_c_NA !NA
#> 10 year 1997 sea_temp_c_NA !NA
#> # ... with 17,654 more rows
gathered_ocean_imp_mean %>%
ggplot(aes(x = value,
fill = value_NA)) +
geom_histogram() +
facet_wrap(~variable_NA)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Created on 2018-08-13 by the reprex package (v0.2.0). |
Created on 2018-05-23 by the reprex package (v0.2.0).
The text was updated successfully, but these errors were encountered: