Create internal data with part 1 output to run tests #963

jhmigueles · 2023-11-03T10:37:09Z

Adapted from @vincentvanhees comment here

Running part 1 in multiple unit tests slows down package testing.

Proposal 1: Include the part 1 output in internal package data and then re-use that for every unit test that is not directly concerned with testing part 1 but depends on part 1 output. Storing the full part 1 output takes up too much space. In #956 I solved it by only storing metalong, object. What we could do is that we use that data to reconstruct M$metalong as I do in that branch and next we could simulate a matching metashort.

Proposal 2: Another approach may be to make sure that part 1 is run once when running the tests and that its data remains available for other unit tests. I do not know how to do this but it may be worth investigating.

vincentvanhees · 2023-11-08T14:57:13Z

@jhmigueles

Another approach may be to combine unit tests more efficiently. We currently have 50 separate unit tests which is a lot.
For example, we could pool them in a way that they are structured with multipel unit test inside a single test file as:

GGIR(mode = 1,...)

test_that("functionality A that depends on part 1 output", {
 GGIR(mode = 2:5,...)
}
test_that("functionality B that depends on part 1 output", {
}
test_that("functionality C that depends on part 1 output", {
}

this may save a lot of computing time as multiple tests can then utilise the initial part 1 output and help to condense the number of unit test files into clear groups of similar functionality.

For example, we could have the following unit test files:

All of GGIR (test_chainof5parts.R)
All function in part 1.
All function in part 2
All function in part 3
All function in part 4
All function in part 5

Well, maybe this is too ambitious, but we could try to consider that while also looking for ways to internally store part 1 data.

jhmigueles · 2023-11-28T09:21:49Z

@vincentvanhees I investigated one option to create part 1 milestone data to be used in the tests. The create_part1_meta function uses the data objects that were already stored in the data folder in the package and repeats the data there to meet the desired duration of the recording (defined with Ndays). The function comes with the possibility to add or not to add light column to speed up the generation if not needed.

As this function takes data that were already stored in the package, it doesn't affect the size of the package. I tested this in test_lightPart5, and locally in my laptop the test takes ~35 seconds in the master branch version, and ~15 seconds using this new approach.

vincentvanhees · 2023-11-30T08:16:23Z

I think that is a good idea as it does not require major changes to the existing tests.

jhmigueles · 2024-02-05T20:10:35Z

I have been working on this during the weekend and just commited the changes in this branch, here an update:

I have reduced the usage of create_test_acc_csv function and now all tests involving part 1 are condensed in a single test routine which only uses this function once.
I have stored new data in the data folder, with updated versions of part 1 metadata (M, C, I, filefodername, filename_dir, and tail_expansion_log objects). These data are used in some tests now. I have not removed the older data files just in case other users depend on them. We can discuss about this when the work is more advanced.
I have removed the test_create_test_acc_csv.R since this file only tested that the csv file is generated, and this is also tested in many of the other tests.
test_part3_1night.R and test_part3_no4am.R are now in a same test routine named test_part3_shortRecordings.R.

I am aware that soon there will be changes in part 1 from PR1027, so I will stop working on this until that PR is merged to avoid complications.

vincentvanhees · 2024-04-04T08:26:46Z

As discussed today, we will explore:

Would be good to investigate whether create_test_acc_csv can be speeded up
Explore ways to store test files in a separate package or online and download from there, e.g. zenodo? (similar to how read.gt3x obtains its test file). If this is possible then add:
- Multiple Actigraph csv files as we currently generate in the tests
- Real files from various sensor brands
- Matching sleeplog and activitylog files.

jhmigueles self-assigned this Nov 3, 2023

vincentvanhees self-assigned this Apr 4, 2024

vincentvanhees added a commit that referenced this issue Apr 16, 2024

speeding up create_test_acc_csv fixes #963

1d268ab

vincentvanhees mentioned this issue Apr 16, 2024

Speedup create_test_acc_csv (as only used for unit tests) #1108

Merged

22 tasks

vincentvanhees closed this as completed in #1108 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create internal data with part 1 output to run tests #963

Create internal data with part 1 output to run tests #963

jhmigueles commented Nov 3, 2023 •

edited

Loading

vincentvanhees commented Nov 8, 2023 •

edited

Loading

jhmigueles commented Nov 28, 2023

vincentvanhees commented Nov 30, 2023

jhmigueles commented Feb 5, 2024

vincentvanhees commented Apr 4, 2024 •

edited

Loading

Create internal data with part 1 output to run tests #963

Create internal data with part 1 output to run tests #963

Comments

jhmigueles commented Nov 3, 2023 • edited Loading

vincentvanhees commented Nov 8, 2023 • edited Loading

jhmigueles commented Nov 28, 2023

vincentvanhees commented Nov 30, 2023

jhmigueles commented Feb 5, 2024

vincentvanhees commented Apr 4, 2024 • edited Loading

jhmigueles commented Nov 3, 2023 •

edited

Loading

vincentvanhees commented Nov 8, 2023 •

edited

Loading

vincentvanhees commented Apr 4, 2024 •

edited

Loading