-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Multiple Choice to Single Column Function #194
Comments
The stuff in 96c3309 looks like a good start. I didn't look closely at the code and mainly focused on the API. My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing. I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like: supertbl |>
reduce_multi_to_single_column() |>
extract_tibble() but this buys us composability: supertbl |>
reduce_multi_to_single_column() |>
reduce_multi_to_single_column() |>
some_other_transformation() |>
... Naming thoughtsMaybe we should call this For parameter names what about: The first 2 are inspired by |
Hi,
There isn't a general rule that says that the pipe should always have inputs and outputs that are the same type of thing. That's more of a dplyr-specific rule that's not shared by other tidy packages such as e.g. tidymodels.
That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be
supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)
I think "unite" is much less specific than "reduce_multi_to_single_column".
My thoughts!
S
…________________________________
From: Ezra Porter ***@***.***>
Sent: Friday, July 12, 2024 2:43 PM
To: CHOP-CGTInformatics/REDCapTidieR ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194)
The stuff in 96c3309<96c3309> looks like a good start. I didn't look closely at the code and mainly focused on the API.
My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.
I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like:
supertbl |>
reduce_multi_to_single_column() |>
extract_tibble()
but this buys us composability:
supertbl |>
reduce_multi_to_single_column() |>
reduce_multi_to_single_column() |>
some_other_transformation() |>
...
Naming thoughts
Maybe we should call this unite_checkbox() in reference to tidyr::unite()?
For parameter names what about:
cols_to -> values_to
no_val -> values_fill
multi_val -> multi_value_label
The first 2 are inspired by pivot_* naming conventions in tidyr
—
Reply to this email directly, view it on GitHub<#194 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACTGHWXWPZPXGODN5JDCPR3ZMAPVTAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE3TCNBZGA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
** This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker**.
|
For the API change, I think this is easily reworkable if we agree on what it should intake and output. I definitely see what you're saying. My concern with returning a supertibble is that you don't actually see the changes from the output of the function, it's sort of "masked" inside of the data tibbles. But maybe that's not a big issue here? Either way the function needs access to the metadata raw/label values associated with the checkboxes to be united, so I don't see much way of not having users supply the supertibble. Otherwise they'd have to supply the data tibble and metadata tibble separately.
I like the naming much better, these all make sense to me. |
@skadauke This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the |
What metadata is needed for the transformation?
…________________________________
From: Rich Hanna ***@***.***>
Sent: Friday, July 12, 2024 2:56 PM
To: CHOP-CGTInformatics/REDCapTidieR ***@***.***>
Cc: Stephan Kadauke ***@***.***>; Mention ***@***.***>
Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194)
That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be
supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)
@skadauke<https://github.com/skadauke> This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally.
—
Reply to this email directly, view it on GitHub<#194 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACTGHWTM263RHLQ7ROGOLXTZMARFFAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE4DOOBUHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
** This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker**.
|
The general rule is to consolidate checkboxes under one column, showing the raw/label value associated with the checkbox if only one value is selected OR a custom value (i.e. "multiple" / "many") if multiple selected. There's no way to grab these values in the data tibble, they are either 1s and 0s or TRUEs and FALSEs. Ex: > nonrepeat_data
# A tibble: 3 × 4
study_id multi___1 multi___2 multi___3
<dbl> <lgl> <lgl> <lgl>
1 1 TRUE FALSE FALSE
2 2 TRUE TRUE FALSE
3 3 FALSE FALSE FALSE
> nonrepeat_metadata
# A tibble: 4 × 2
field_name select_choices_or_calculations
<chr> <chr>
1 study_id NA
2 multi___1 1, Red | 2, Yellow | 3, Blue
3 multi___2 1, Red | 2, Yellow | 3, Blue
4 multi___3 1, Red | 2, Yellow | 3, Blue |
Feature Request Description
It can be a common use case that multiple choice fields (i.e. checkbox fields) need to be consolidated into a single column, such as when powering Table 1s for manuscript reporting.
Proposed Solution
This function, placeholder name
reduce_multi_to_single_column()
will be the first REDCapTidieR analytic tool that users can implement on columns in their extracted tibbles.It should:
starts_with("race")
)Additional Context
This was prompted by the request in #192 and should be a more generalizable solution.
Checklist
The text was updated successfully, but these errors were encountered: