Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Multiple Choice to Single Column Function #194

Closed
5 tasks done
rsh52 opened this issue May 30, 2024 · 6 comments · Fixed by #196
Closed
5 tasks done

[FEATURE] Multiple Choice to Single Column Function #194

rsh52 opened this issue May 30, 2024 · 6 comments · Fixed by #196
Assignees
Labels
enhancement New feature or request

Comments

@rsh52
Copy link
Collaborator

rsh52 commented May 30, 2024

Feature Request Description

It can be a common use case that multiple choice fields (i.e. checkbox fields) need to be consolidated into a single column, such as when powering Table 1s for manuscript reporting.

Proposed Solution

This function, placeholder name reduce_multi_to_single_column() will be the first REDCapTidieR analytic tool that users can implement on columns in their extracted tibbles.

It should:

  • Be capable of using tidyselect functions (i.e. starts_with("race"))
  • Supply users an option of what to change multiple selections to (i.e. "Multiple", "Other", etc.) per their specification

Additional Context

This was prompted by the request in #192 and should be a more generalizable solution.

Checklist

  • The issue is atomic
  • The issue description is documented
  • The issue title describes the problem succinctly
  • Developers are assigned to the issue
  • Labels are assigned to the issue
@ezraporter
Copy link
Collaborator

The stuff in 96c3309 looks like a good start. I didn't look closely at the code and mainly focused on the API.

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like:

supertbl |>
  reduce_multi_to_single_column() |>
  extract_tibble()

but this buys us composability:

supertbl |>
  reduce_multi_to_single_column() |>
  reduce_multi_to_single_column() |>
  some_other_transformation() |>
  ...

Naming thoughts

Maybe we should call this unite_checkbox() in reference to tidyr::unite()?

For parameter names what about:
cols_to -> values_to
no_val -> values_fill
multi_val -> multi_value_label

The first 2 are inspired by pivot_* naming conventions in tidyr

@skadauke
Copy link
Collaborator

skadauke commented Jul 12, 2024 via email

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 12, 2024

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

For the API change, I think this is easily reworkable if we agree on what it should intake and output. I definitely see what you're saying. My concern with returning a supertibble is that you don't actually see the changes from the output of the function, it's sort of "masked" inside of the data tibbles. But maybe that's not a big issue here?

Either way the function needs access to the metadata raw/label values associated with the checkboxes to be united, so I don't see much way of not having users supply the supertibble. Otherwise they'd have to supply the data tibble and metadata tibble separately.

Naming thoughts

I like the naming much better, these all make sense to me.

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 12, 2024

That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be
supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)

@skadauke This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally.

@skadauke
Copy link
Collaborator

skadauke commented Jul 12, 2024 via email

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 12, 2024

What metadata is needed for the transformation?

The general rule is to consolidate checkboxes under one column, showing the raw/label value associated with the checkbox if only one value is selected OR a custom value (i.e. "multiple" / "many") if multiple selected. There's no way to grab these values in the data tibble, they are either 1s and 0s or TRUEs and FALSEs. Ex:

> nonrepeat_data
# A tibble: 3 × 4
  study_id multi___1 multi___2 multi___3
     <dbl> <lgl>     <lgl>     <lgl>    
1        1 TRUE      FALSE     FALSE    
2        2 TRUE      TRUE      FALSE    
3        3 FALSE     FALSE     FALSE    
> nonrepeat_metadata
# A tibble: 4 × 2
  field_name select_choices_or_calculations
  <chr>      <chr>                         
1 study_id   NA                            
2 multi___1  1, Red | 2, Yellow | 3, Blue  
3 multi___2  1, Red | 2, Yellow | 3, Blue  
4 multi___3  1, Red | 2, Yellow | 3, Blue  

@rsh52 rsh52 mentioned this issue Jul 15, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants