You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Searching for different kinds of missing values is really tough, and annoying.
If you've got values like -99 in your data, when they shouldn't be there, or they should be encoded as missing, it can be difficult to ascertain if they are there, and if so, where they are
The idea then, is to create some functions that allow users to search for particular occurences of these values across their variables. This would also allow the user to specify their own patterns. In the future this could then be visualised with a function like vis_miss - but one that handles a given pattern (sort of like what I proposed for vis_expect)
library(purrr)
library(tidyr)
dat_ms<-tibble::tribble(~x, ~y, ~z,
1, "A", -100,
3, "N/A", -99,
NA, NA, -98,
-99, "E", -101,
-98, "F", -1)
miss_scan_count<-function(data,search){
# if there is only one value to searchif (length(search) ==1) {
map_df(data,~length(grep(search,x=.))) %>%
# return the dataframe with the columns "tidyr::gather(key="Variable",
value="n")
# but if there are more than one, we need to combine the search terms
} elseif (length(search) >1) {
map_df(data,~length(grep(paste0(search, collapse="|"),x=.))) %>%
tidyr::gather(key="Variable",
value="n")
}
}
miss_scan_count(dat_ms,"-99")
#> # A tibble: 3 x 2#> Variable n#> <chr> <int>#> 1 x 1#> 2 y 0#> 3 z 1
miss_scan_count(dat_ms,c("-99","-98"))
#> # A tibble: 3 x 2#> Variable n#> <chr> <int>#> 1 x 2#> 2 y 0#> 3 z 2
miss_scan_count(dat_ms,c("-99","-98","N/A"))
#> # A tibble: 3 x 2#> Variable n#> <chr> <int>#> 1 x 2#> 2 y 1#> 3 z 2
Another possibility is that it would include common missing data specifications ("na", "N A", "n a", "N/A", "Not Available", "missing", -99, -98, -9, etc). - but it's hard to say what is "common"!
The text was updated successfully, but these errors were encountered:
Searching for different kinds of missing values is really tough, and annoying.
If you've got values like -99 in your data, when they shouldn't be there, or they should be encoded as missing, it can be difficult to ascertain if they are there, and if so, where they are
The idea then, is to create some functions that allow users to search for particular occurences of these values across their variables. This would also allow the user to specify their own patterns. In the future this could then be visualised with a function like
vis_miss
- but one that handles a given pattern (sort of like what I proposed forvis_expect
)Another possibility is that it would include common missing data specifications ("na", "N A", "n a", "N/A", "Not Available", "missing", -99, -98, -9, etc). - but it's hard to say what is "common"!
The text was updated successfully, but these errors were encountered: