Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create internal data with part 1 output to run tests #963

Closed
jhmigueles opened this issue Nov 3, 2023 · 5 comments · Fixed by #1108
Closed

Create internal data with part 1 output to run tests #963

jhmigueles opened this issue Nov 3, 2023 · 5 comments · Fixed by #1108
Assignees

Comments

@jhmigueles
Copy link
Collaborator

jhmigueles commented Nov 3, 2023

Adapted from @vincentvanhees comment here

Running part 1 in multiple unit tests slows down package testing.

Proposal 1: Include the part 1 output in internal package data and then re-use that for every unit test that is not directly concerned with testing part 1 but depends on part 1 output. Storing the full part 1 output takes up too much space. In #956 I solved it by only storing metalong, object. What we could do is that we use that data to reconstruct M$metalong as I do in that branch and next we could simulate a matching metashort.

Proposal 2: Another approach may be to make sure that part 1 is run once when running the tests and that its data remains available for other unit tests. I do not know how to do this but it may be worth investigating.

@jhmigueles jhmigueles self-assigned this Nov 3, 2023
@vincentvanhees
Copy link
Member

vincentvanhees commented Nov 8, 2023

@jhmigueles

Another approach may be to combine unit tests more efficiently. We currently have 50 separate unit tests which is a lot.
For example, we could pool them in a way that they are structured with multipel unit test inside a single test file as:

GGIR(mode = 1,...)

test_that("functionality A that depends on part 1 output", {
 GGIR(mode = 2:5,...)
}
test_that("functionality B that depends on part 1 output", {
}
test_that("functionality C that depends on part 1 output", {
}

this may save a lot of computing time as multiple tests can then utilise the initial part 1 output and help to condense the number of unit test files into clear groups of similar functionality.

For example, we could have the following unit test files:

  • All of GGIR (test_chainof5parts.R)
  • All function in part 1.
  • All function in part 2
  • All function in part 3
  • All function in part 4
  • All function in part 5

Well, maybe this is too ambitious, but we could try to consider that while also looking for ways to internally store part 1 data.

@jhmigueles
Copy link
Collaborator Author

@vincentvanhees I investigated one option to create part 1 milestone data to be used in the tests. The create_part1_meta function uses the data objects that were already stored in the data folder in the package and repeats the data there to meet the desired duration of the recording (defined with Ndays). The function comes with the possibility to add or not to add light column to speed up the generation if not needed.

As this function takes data that were already stored in the package, it doesn't affect the size of the package. I tested this in test_lightPart5, and locally in my laptop the test takes ~35 seconds in the master branch version, and ~15 seconds using this new approach.

@vincentvanhees
Copy link
Member

I think that is a good idea as it does not require major changes to the existing tests.

@jhmigueles
Copy link
Collaborator Author

I have been working on this during the weekend and just commited the changes in this branch, here an update:

  • I have reduced the usage of create_test_acc_csv function and now all tests involving part 1 are condensed in a single test routine which only uses this function once.
  • I have stored new data in the data folder, with updated versions of part 1 metadata (M, C, I, filefodername, filename_dir, and tail_expansion_log objects). These data are used in some tests now. I have not removed the older data files just in case other users depend on them. We can discuss about this when the work is more advanced.
  • I have removed the test_create_test_acc_csv.R since this file only tested that the csv file is generated, and this is also tested in many of the other tests.
  • test_part3_1night.R and test_part3_no4am.R are now in a same test routine named test_part3_shortRecordings.R.

I am aware that soon there will be changes in part 1 from PR1027, so I will stop working on this until that PR is merged to avoid complications.

@vincentvanhees
Copy link
Member

vincentvanhees commented Apr 4, 2024

As discussed today, we will explore:

  1. Would be good to investigate whether create_test_acc_csv can be speeded up
  2. Explore ways to store test files in a separate package or online and download from there, e.g. zenodo? (similar to how read.gt3x obtains its test file). If this is possible then add:
    • Multiple Actigraph csv files as we currently generate in the tests
    • Real files from various sensor brands
    • Matching sleeplog and activitylog files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants