[FEATURE] Multiple Choice to Single Column Function #194

rsh52 · 2024-05-30T21:28:23Z

Feature Request Description

It can be a common use case that multiple choice fields (i.e. checkbox fields) need to be consolidated into a single column, such as when powering Table 1s for manuscript reporting.

Proposed Solution

This function, placeholder name reduce_multi_to_single_column() will be the first REDCapTidieR analytic tool that users can implement on columns in their extracted tibbles.

It should:

Be capable of using tidyselect functions (i.e. starts_with("race"))
Supply users an option of what to change multiple selections to (i.e. "Multiple", "Other", etc.) per their specification

Additional Context

This was prompted by the request in #192 and should be a more generalizable solution.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

The text was updated successfully, but these errors were encountered:

ezraporter · 2024-07-12T18:43:15Z

The stuff in 96c3309 looks like a good start. I didn't look closely at the code and mainly focused on the API.

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like:

supertbl |>
  reduce_multi_to_single_column() |>
  extract_tibble()

but this buys us composability:

supertbl |>
  reduce_multi_to_single_column() |>
  reduce_multi_to_single_column() |>
  some_other_transformation() |>
  ...

Naming thoughts

Maybe we should call this unite_checkbox() in reference to tidyr::unite()?

For parameter names what about:
cols_to -> values_to
no_val -> values_fill
multi_val -> multi_value_label

The first 2 are inspired by pivot_* naming conventions in tidyr

skadauke · 2024-07-12T18:53:21Z

Hi, There isn't a general rule that says that the pipe should always have inputs and outputs that are the same type of thing. That's more of a dplyr-specific rule that's not shared by other tidy packages such as e.g. tidymodels. That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...) I think "unite" is much less specific than "reduce_multi_to_single_column". My thoughts! S

…

________________________________ From: Ezra Porter ***@***.***> Sent: Friday, July 12, 2024 2:43 PM To: CHOP-CGTInformatics/REDCapTidieR ***@***.***> Cc: Subscribed ***@***.***> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194) The stuff in 96c3309<96c3309> looks like a good start. I didn't look closely at the code and mainly focused on the API. My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing. I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like: supertbl |> reduce_multi_to_single_column() |> extract_tibble() but this buys us composability: supertbl |> reduce_multi_to_single_column() |> reduce_multi_to_single_column() |> some_other_transformation() |> ... Naming thoughts Maybe we should call this unite_checkbox() in reference to tidyr::unite()? For parameter names what about: cols_to -> values_to no_val -> values_fill multi_val -> multi_value_label The first 2 are inspired by pivot_* naming conventions in tidyr — Reply to this email directly, view it on GitHub<#194 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACTGHWXWPZPXGODN5JDCPR3ZMAPVTAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE3TCNBZGA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***> ** This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker**.

rsh52 · 2024-07-12T18:53:30Z

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

For the API change, I think this is easily reworkable if we agree on what it should intake and output. I definitely see what you're saying. My concern with returning a supertibble is that you don't actually see the changes from the output of the function, it's sort of "masked" inside of the data tibbles. But maybe that's not a big issue here?

Either way the function needs access to the metadata raw/label values associated with the checkboxes to be united, so I don't see much way of not having users supply the supertibble. Otherwise they'd have to supply the data tibble and metadata tibble separately.

Naming thoughts

I like the naming much better, these all make sense to me.

rsh52 · 2024-07-12T18:55:56Z

That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be
supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)

@skadauke This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally.

skadauke · 2024-07-12T18:59:26Z

What metadata is needed for the transformation?

…

________________________________ From: Rich Hanna ***@***.***> Sent: Friday, July 12, 2024 2:56 PM To: CHOP-CGTInformatics/REDCapTidieR ***@***.***> Cc: Stephan Kadauke ***@***.***>; Mention ***@***.***> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194) That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...) @skadauke<https://github.com/skadauke> This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally. — Reply to this email directly, view it on GitHub<#194 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACTGHWTM263RHLQ7ROGOLXTZMARFFAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE4DOOBUHA>. You are receiving this because you were mentioned.Message ID: ***@***.***> ** This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker**.

rsh52 · 2024-07-12T19:02:10Z

What metadata is needed for the transformation?

The general rule is to consolidate checkboxes under one column, showing the raw/label value associated with the checkbox if only one value is selected OR a custom value (i.e. "multiple" / "many") if multiple selected. There's no way to grab these values in the data tibble, they are either 1s and 0s or TRUEs and FALSEs. Ex:

> nonrepeat_data
# A tibble: 3 × 4
  study_id multi___1 multi___2 multi___3
     <dbl> <lgl>     <lgl>     <lgl>    
1        1 TRUE      FALSE     FALSE    
2        2 TRUE      TRUE      FALSE    
3        3 FALSE     FALSE     FALSE    
> nonrepeat_metadata
# A tibble: 4 × 2
  field_name select_choices_or_calculations
  <chr>      <chr>                         
1 study_id   NA                            
2 multi___1  1, Red | 2, Yellow | 3, Blue  
3 multi___2  1, Red | 2, Yellow | 3, Blue  
4 multi___3  1, Red | 2, Yellow | 3, Blue

rsh52 added the enhancement New feature or request label May 30, 2024

rsh52 assigned rsh52 and ezraporter May 30, 2024

rsh52 mentioned this issue May 30, 2024

[FEATURE] Add a function to make the default REDCap demographic race variables (7 choices) into a single race variable while allowing for an 8th choice, Multiple Races. #192

Closed

2 tasks

rsh52 mentioned this issue Jul 15, 2024

combine_checkboxes #196

Merged

13 tasks

rsh52 closed this as completed in #196 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Multiple Choice to Single Column Function #194

[FEATURE] Multiple Choice to Single Column Function #194

rsh52 commented May 30, 2024 •

edited

Loading

ezraporter commented Jul 12, 2024

skadauke commented Jul 12, 2024 via email

rsh52 commented Jul 12, 2024

rsh52 commented Jul 12, 2024

skadauke commented Jul 12, 2024 via email

rsh52 commented Jul 12, 2024

[FEATURE] Multiple Choice to Single Column Function #194

[FEATURE] Multiple Choice to Single Column Function #194

Comments

rsh52 commented May 30, 2024 • edited Loading

Feature Request Description

Proposed Solution

Additional Context

Checklist

ezraporter commented Jul 12, 2024

Naming thoughts

skadauke commented Jul 12, 2024 via email

rsh52 commented Jul 12, 2024

rsh52 commented Jul 12, 2024

skadauke commented Jul 12, 2024 via email

rsh52 commented Jul 12, 2024

rsh52 commented May 30, 2024 •

edited

Loading