Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add part 6 for household co-analysis and Circadian Rhythm analysis #956

Merged
merged 66 commits into from
Nov 5, 2023

Conversation

vincentvanhees
Copy link
Member

@vincentvanhees vincentvanhees commented Oct 27, 2023

In this PR I add new part 6 to facilitate Household co-analysis and recording level Circadian Rhythm analysis that depends on knowledge about sleep and wake. There is more functionality I want to add, but it may be good to first merge this initial part with:

To use this, the PR introduces a few extra arguments:

  • part6HCA, Boolean to turn on Household Co-analysis
  • part6_threshold_combi, Character to specify which threshold combination needs to be used in part 6 ("40_100_400") in the future we could allow for multiple threshold combinations but for now it works on one.
  • part6CR, Boolean to turn on Circadian rhythm analysis
  • part6Window, vector of length 2 to specify from where to where the time series should be used, e.g. c("O3", "W-2") means from 3rd onset till 2 wakeuptimes before the end.

Checklist before merging:

  • Existing tests still work (check by running the test suite, e.g. from RStudio).
  • Added tests (if you added functionality) or fixed existing test (if you fixed a bug).
  • Updated or expanded the documentation.
  • Updated release notes in inst/NEWS.Rd with a user-readable summary. Please, include references to relevant issues or PR discussions.
  • Added your name to the contributors lists in the DESCRIPTION and CITATION.cff files.

- Test files to aid new part 6 unit test
- No attempt yet to use paralisation for household
- New parameter to let user choose the part 5 threshold combination to use for part 6
@vincentvanhees
Copy link
Member Author

vincentvanhees commented Nov 1, 2023

I just made two additional commits that:

  • Improve handling of non-default argument part6Window values
  • Expand unit test to test those non-default part6Window values.

Outstanding work may be:

  • Reconsider whether part6Window is a good argument name given that it is only used for circadian rhythm. Another approach could be to integrate it also in the household co-analysis or to change the name into part6WindowCR.
  • Function g.report.part6 is not documented I see.
    However, maybe it is easier to wait for your feedback before doing these things.

Copy link
Collaborator

@jhmigueles jhmigueles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vincentvanhees for all this work. It is great, and I got it to work as expected. Here just some input in case it helps, I think we could facilitate more using the new arguments and I also have some feedback for the documentation. More details in the specific comments.

  • I think it is important to document why the household co-analysis is only available with timewindow = "WW" (I left a specific comment on this)
  • New functions are missing in NAMESPACE, maybe they are not needed there, but it facilitates debugging.
  • Is it possible to keep documentation on household more open for other potential applications of this functionality? e.g., matching recordings from different body attachment sites in the same participant. I see this is not what you have developed this funcionality for, but it could help to do this other application as well.
  • Would it be possible to always store the part5 timeseries as part of the part 5 milestone data as GGIR does in part 1 and part 2? Then, the save_ms5raw parameters would only be only be applicable for exporting the time series in csv format. In this case we make sure that the data required to run part 6 is always part of the milestone data and make running part 6 easier for the end users.
  • I would add 3 extra checks in check_params (two of them detailed in a specific comment): 1) if user sets part6HCA = TRUE, then save_ms5raw parameters are internally handled not to be problematic (exported in RData and including the invalid time); 2) if user provides only 1 threshold for lig, mod and vig, then part6_threshold_combi is internally derived; and 3) if part6 is run, then do.report should include the number 6 in the case the user forgot about it.
  • Maybe for a different project in the future, if we would have a flexible enough function to run hte visualreport, this same function could be used to visualize the household co-analysis, just by adding lines to the same plots.

\title{part6AlignIndividuals}
\description{
Align individual time series per household where households are identified
by the character or number string between the first and second '-' in the filename.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the first and the second '-' be the first and the second character as defined by idloc? So that this is a bit more flexible and consistent with current requirements for filename

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using this a few times, now I think this could complicate things and the current option might be easier to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the challenge is that the participant ID only occurs after the seconds '-'. So, the easiest solution may be to use a different character. For example:

studyID-householdID-memberID_restOfFileName.bin

@@ -90,9 +90,11 @@ check_params = function(params_sleep = c(), params_metrics = c(),
"boutcriter.in", "boutcriter.lig", "boutcriter.mvpa",
"threshold.lig", "threshold.mod", "threshold.vig", "boutdur.mvpa",
"boutdur.in", "boutdur.lig")
character_params = c("frag.metrics", "part6_threshold_combi")
check_class("phyact", params = params_phyact, parnames = numeric_params, parclass = "numeric")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an extra check for part6_threshold_combi so that if user provides only 1 set of thresholds (most usual scenario I would say), then it takes whatever threshold combination it is. In that case this argument is only needed when user provides more than 1 set of thresholds (not very frequent). Something like:

if (length(params_phyact[["threshold.lig"]]) == 1 & length(params_phyact[["threshold.mod"]]) == 1 & length(params_phyact[["threshold.vig"]]) == 1) {

    params_phyact[["part6_threshold_combi"]] = paste(params_phyact[["threshold.lig"]], params_phyact[["threshold.mod"]], params_phyact[["threshold.vig"]], sep = "_")

}

And another check that makes sure that ms5raw parameters are correctly defined in the case that part6HCA = TRUE:

if (params_247[["part6HCA"]] == TRUE) {

    params_output[["save_ms5rawlevels"]] = TRUE
    params_output[["save_ms5raw_format"]] = "RData"
    params_output[["save_ms5raw_without_invalid"]] = FALSE

}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

I would add an extra check for part6_threshold_combi...

Thanks, I am now adding this.

And another check that

This was partly in the check_params() with a minor variation:

if (params_247[["part6HCA"]] == TRUE) {
  # Add RData because part 6 will need it
  params_247[["save_ms5raw_format"]] = unique(c(params_247[["save_ms5raw_format"]], "RData"))
  params_247[["save_ms5rawlevels"]] = TRUE
  params_output[["save_ms5raw_without_invalid"]] = FALSE
}

In this way the user can still ask for .csv files if they want them.

will be recognised as household ID `002` with member IDs `001`, `002`, and `M`.

# Configuring GGIR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add save_ms5raw parameters in this section to specify that they need to be calculated and in which way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are now automatically set by which user does not need to worry about them. Correct?

acc_wake2_before_wake2 | Mean activity of second person to wake up before they woke up
lux_wake1_before_wake2 | Peak LUX of the person who wakes up first during the four minutes before the second person wakes up
lux_wake2_before_wake2 | Peak LUX of second person to wake up before they woke up
deltatime | Difference in wake up time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think variable names here are lacking the unit of measurement. Is this minutes? seconds? Also for acceleration outcomes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I have now updated these.

For the LUX variables I am skipping this. The unit of light intensity is LUX, but I have tried to avoid the term light as light can be misunderstood as light activity. So, everywhere in GGIR where we discuss light exposure we call it lux. Not ideal, but I am not sure how else to do it.

to the accelerometer threshold combination that is used in part 5 that you want to use
in part 6. GGIR part 5 facilitates multiple threshold combinations but in part 6 you
need to select one.
- `timewindow = WW` to run part 5 analysis based on waking-up to waking-up windows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to explain why the household co-analysis can only be derived with timewindow = "WW". I guess the reason is that many of the variables compare wakeup time of the house members and it's easier to do that in a "WW" timewindow. Although it would also be possible in a "OO" timewindow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the WW specification is not needed. The code should work with all of them because it runs on the full timeseries and not on the part 5 windows outcomes. I will now omit those parts of the documentation.

be run by part 6.
}
\item{part6Window}{
Character vector with length two (default = c("start", "end")) to indicate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I want to analyse circadian rhythms from first wakeup to last wakeup, this argument should be part6Window = c("W1", "W-1"), is this correct? I think this definition would benefit from a couple of quick examples on how to use it.

}
rm(currentColorPalette)
# Store data
write.csv(x = out2, file = paste0(path_results, "/alignedTimeseries/timeseries_HID_",uHID[h],".csv"), row.names = FALSE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give the option to store this as RData to save space?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, forgot to respond to this... now looking into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now changing this to store RData by default. The few users who may want to work with this data will just have to load the RData files. I think there is no major need to facilitate both csv and RData.

@vincentvanhees vincentvanhees mentioned this pull request Nov 2, 2023
5 tasks
@vincentvanhees
Copy link
Member Author

vincentvanhees commented Nov 2, 2023

Creating a check list for myself based on your feedback:

  • I think it is important to document why the household co-analysis is only available with timewindow = "WW" (I left a specific comment on this)
  • New functions are missing in NAMESPACE, maybe they are not needed there, but it facilitates debugging.
  • Is it possible to keep documentation on household more open for other potential applications of this functionality? e.g., matching recordings from different body attachment sites in the same participant. I see this is not what you have developed this functionality for, but it could help to do this other application as well.

Response: I think for now it is better to pitch this as a household co-analysis functionality and not broaden it, because that is what I have tested it for and for which it has a clear use-case. The disadvantage of making it more generic is that those who want to do household co-analysis may feel that the functionality is not tailored to them. If in the future there would be a clear need for multi-sensor data analysis or comparisons then I think it is best to review whether the current code is truly the best place to do it and what kind of functionality would be needed for it.

  • Would it be possible to always store the part5 timeseries as part of the part 5 milestone data as GGIR does in part 1 and part 2? Then, the save_ms5raw parameters would only be only be applicable for exporting the time series in csv format. In this case we make sure that the data required to run part 6 is always part of the milestone data and make running part 6 easier for the end users.

Response: I see your point but I did not do this because:

  1. This PR makes sure that the data required for part 6 is always part of the output by setting save_ms5rawlevels to TRUE and making sure there is at least RData stored. Although I agree that it will simplify the code a bit if the data was always stored in the same place.
  2. Moving the time series to the normal part 5 milestone data breaks consistency for those who have used RData files from the raw.out folder. They would then have to rewrite their code/packages to use a new folder. I find it difficult to oversee how much problems this will cause.
  3. The current solution allows people to copy the time series files and only work with them. If this is integrated in the part5 milestone data then there is a larger need for us to also document all the milestone data. Currently the milestone data is poorly documented. Storing them in a seperate folder may be a way to encourage users to use them, as opposed to hiding the time series with among other less user-friendly objects.
  4. Store the time series in the milestone files, increases the size of GGIR output significantly for users who are not interested in the time series and who process large datasets.
  • I would add 3 extra checks in check_params (two of them detailed in a specific comment): 1) if user sets part6HCA = TRUE, then save_ms5raw parameters are internally handled not to be problematic (exported in RData and including the invalid time); 2) if user provides only 1 threshold for lig, mod and vig, then part6_threshold_combi is internally derived; and 3) if part6 is run, then do.report should include the number 6 in the case the user forgot about it.

Maybe for a different project in the future, if we would have a flexible enough function to run hte visualreport, this same function could be used to visualize the household co-analysis, just by adding lines to the same plots

I agree, the plot created for HCA was mainly intended to help myself check that the code works. More generally speaking it would be good to work towards a better strategy for visualisation in GGIR, at the moment we have various isolated visualisation while it may be more efficient to have a central visualisation tool to replace all these individual visualisations. Let's discuss separately.

@jhmigueles jhmigueles mentioned this pull request Nov 3, 2023
5 tasks
jhmigueles and others added 9 commits November 4, 2023 10:30
we now have report turned on by default for all parts which will generated a lot of console messages if the user only wants to run a few parts, I think this clutters the console unnecessarily. So, i am now turning these off when the report is not generated. If the report is generated all works as normal.
@vincentvanhees vincentvanhees merged commit 3a62d29 into master Nov 5, 2023
8 checks passed
@vincentvanhees vincentvanhees deleted the issue906_create_part6 branch November 5, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorporate household co-analysis Add optional cosinor analysis to part 6
3 participants