-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add part 6 for household co-analysis and Circadian Rhythm analysis #956
Conversation
- Test files to aid new part 6 unit test - No attempt yet to use paralisation for household - New parameter to let user choose the part 5 threshold combination to use for part 6
…ic time segment for CR analysis in part 6 #906
Outstanding work: documentation
I just made two additional commits that:
Outstanding work may be:
|
… excluded by part 4 because of excludefirstlast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vincentvanhees for all this work. It is great, and I got it to work as expected. Here just some input in case it helps, I think we could facilitate more using the new arguments and I also have some feedback for the documentation. More details in the specific comments.
- I think it is important to document why the household co-analysis is only available with timewindow = "WW" (I left a specific comment on this)
- New functions are missing in NAMESPACE, maybe they are not needed there, but it facilitates debugging.
- Is it possible to keep documentation on household more open for other potential applications of this functionality? e.g., matching recordings from different body attachment sites in the same participant. I see this is not what you have developed this funcionality for, but it could help to do this other application as well.
- Would it be possible to always store the part5 timeseries as part of the part 5 milestone data as GGIR does in part 1 and part 2? Then, the save_ms5raw parameters would only be only be applicable for exporting the time series in csv format. In this case we make sure that the data required to run part 6 is always part of the milestone data and make running part 6 easier for the end users.
- I would add 3 extra checks in check_params (two of them detailed in a specific comment): 1) if user sets part6HCA = TRUE, then save_ms5raw parameters are internally handled not to be problematic (exported in RData and including the invalid time); 2) if user provides only 1 threshold for lig, mod and vig, then part6_threshold_combi is internally derived; and 3) if part6 is run, then do.report should include the number 6 in the case the user forgot about it.
- Maybe for a different project in the future, if we would have a flexible enough function to run hte visualreport, this same function could be used to visualize the household co-analysis, just by adding lines to the same plots.
\title{part6AlignIndividuals} | ||
\description{ | ||
Align individual time series per household where households are identified | ||
by the character or number string between the first and second '-' in the filename. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the first and the second '-' be the first and the second character as defined by idloc? So that this is a bit more flexible and consistent with current requirements for filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After using this a few times, now I think this could complicate things and the current option might be easier to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the challenge is that the participant ID only occurs after the seconds '-'. So, the easiest solution may be to use a different character. For example:
studyID-householdID-memberID_restOfFileName.bin
@@ -90,9 +90,11 @@ check_params = function(params_sleep = c(), params_metrics = c(), | |||
"boutcriter.in", "boutcriter.lig", "boutcriter.mvpa", | |||
"threshold.lig", "threshold.mod", "threshold.vig", "boutdur.mvpa", | |||
"boutdur.in", "boutdur.lig") | |||
character_params = c("frag.metrics", "part6_threshold_combi") | |||
check_class("phyact", params = params_phyact, parnames = numeric_params, parclass = "numeric") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an extra check for part6_threshold_combi
so that if user provides only 1 set of thresholds (most usual scenario I would say), then it takes whatever threshold combination it is. In that case this argument is only needed when user provides more than 1 set of thresholds (not very frequent). Something like:
if (length(params_phyact[["threshold.lig"]]) == 1 & length(params_phyact[["threshold.mod"]]) == 1 & length(params_phyact[["threshold.vig"]]) == 1) {
params_phyact[["part6_threshold_combi"]] = paste(params_phyact[["threshold.lig"]], params_phyact[["threshold.mod"]], params_phyact[["threshold.vig"]], sep = "_")
}
And another check that makes sure that ms5raw
parameters are correctly defined in the case that part6HCA = TRUE
:
if (params_247[["part6HCA"]] == TRUE) {
params_output[["save_ms5rawlevels"]] = TRUE
params_output[["save_ms5raw_format"]] = "RData"
params_output[["save_ms5raw_without_invalid"]] = FALSE
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,
I would add an extra check for part6_threshold_combi...
Thanks, I am now adding this.
And another check that
This was partly in the check_params() with a minor variation:
if (params_247[["part6HCA"]] == TRUE) {
# Add RData because part 6 will need it
params_247[["save_ms5raw_format"]] = unique(c(params_247[["save_ms5raw_format"]], "RData"))
params_247[["save_ms5rawlevels"]] = TRUE
params_output[["save_ms5raw_without_invalid"]] = FALSE
}
In this way the user can still ask for .csv files if they want them.
will be recognised as household ID `002` with member IDs `001`, `002`, and `M`. | ||
|
||
# Configuring GGIR | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add save_ms5raw
parameters in this section to specify that they need to be calculated and in which way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are now automatically set by which user does not need to worry about them. Correct?
vignettes/HouseHoldCoanalysis.Rmd
Outdated
acc_wake2_before_wake2 | Mean activity of second person to wake up before they woke up | ||
lux_wake1_before_wake2 | Peak LUX of the person who wakes up first during the four minutes before the second person wakes up | ||
lux_wake2_before_wake2 | Peak LUX of second person to wake up before they woke up | ||
deltatime | Difference in wake up time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think variable names here are lacking the unit of measurement. Is this minutes? seconds? Also for acceleration outcomes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I have now updated these.
For the LUX variables I am skipping this. The unit of light intensity is LUX, but I have tried to avoid the term light as light can be misunderstood as light activity. So, everywhere in GGIR where we discuss light exposure we call it lux. Not ideal, but I am not sure how else to do it.
vignettes/HouseHoldCoanalysis.Rmd
Outdated
to the accelerometer threshold combination that is used in part 5 that you want to use | ||
in part 6. GGIR part 5 facilitates multiple threshold combinations but in part 6 you | ||
need to select one. | ||
- `timewindow = WW` to run part 5 analysis based on waking-up to waking-up windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to explain why the household co-analysis can only be derived with timewindow = "WW". I guess the reason is that many of the variables compare wakeup time of the house members and it's easier to do that in a "WW" timewindow. Although it would also be possible in a "OO" timewindow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the WW specification is not needed. The code should work with all of them because it runs on the full timeseries and not on the part 5 windows outcomes. I will now omit those parts of the documentation.
be run by part 6. | ||
} | ||
\item{part6Window}{ | ||
Character vector with length two (default = c("start", "end")) to indicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I want to analyse circadian rhythms from first wakeup to last wakeup, this argument should be part6Window = c("W1", "W-1")
, is this correct? I think this definition would benefit from a couple of quick examples on how to use it.
R/part6AlignIndividuals.R
Outdated
} | ||
rm(currentColorPalette) | ||
# Store data | ||
write.csv(x = out2, file = paste0(path_results, "/alignedTimeseries/timeseries_HID_",uHID[h],".csv"), row.names = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give the option to store this as RData
to save space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, forgot to respond to this... now looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now changing this to store RData by default. The few users who may want to work with this data will just have to load the RData files. I think there is no major need to facilitate both csv and RData.
…nd turn improve preparation for part6HCA
Creating a check list for myself based on your feedback:
Response: I think for now it is better to pitch this as a household co-analysis functionality and not broaden it, because that is what I have tested it for and for which it has a clear use-case. The disadvantage of making it more generic is that those who want to do household co-analysis may feel that the functionality is not tailored to them. If in the future there would be a clear need for multi-sensor data analysis or comparisons then I think it is best to review whether the current code is truly the best place to do it and what kind of functionality would be needed for it.
Response: I see your point but I did not do this because:
I agree, the plot created for HCA was mainly intended to help myself check that the code works. More generally speaking it would be good to work towards a better strategy for visualisation in GGIR, at the moment we have various isolated visualisation while it may be more efficient to have a central visualisation tool to replace all these individual visualisations. Let's discuss separately. |
…rt6_branch Minor edits to PR issue906_create_part6 branch
…ining the timeseries
…lems in g.report.part6
we now have report turned on by default for all parts which will generated a lot of console messages if the user only wants to run a few parts, I think this clutters the console unnecessarily. So, i am now turning these off when the report is not generated. If the report is generated all works as normal.
Edits jairo to issue906
In this PR I add new part 6 to facilitate Household co-analysis and recording level Circadian Rhythm analysis that depends on knowledge about sleep and wake. There is more functionality I want to add, but it may be good to first merge this initial part with:
To use this, the PR introduces a few extra arguments:
part6HCA
, Boolean to turn on Household Co-analysispart6_threshold_combi
, Character to specify which threshold combination needs to be used in part 6 ("40_100_400") in the future we could allow for multiple threshold combinations but for now it works on one.part6CR
, Boolean to turn on Circadian rhythm analysispart6Window
, vector of length 2 to specify from where to where the time series should be used, e.g. c("O3", "W-2") means from 3rd onset till 2 wakeuptimes before the end.Checklist before merging:
inst/NEWS.Rd
with a user-readable summary. Please, include references to relevant issues or PR discussions.DESCRIPTION
andCITATION.cff
files.