Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking looper testing #464

Closed
nsheff opened this issue Feb 16, 2024 · 8 comments
Closed

Rethinking looper testing #464

nsheff opened this issue Feb 16, 2024 · 8 comments
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Feb 16, 2024

We seem to have lots of pytests for looper, and even a 'smoketests' subfolder.

What is the distinction between the regular tests and the smoketests? at first glance, they just look like other unit tests.

Anyway, it's clear to me after tying to get looper to work today that our tests are not effective for actually testing looper functionality. I ran into so many little issues that it's clear we're not actually testing the way looper is actually used.

I think we need to rethink the way we are testing this package. I think we should redesign tests around some kind of more holistic use cases, and maybe step away from testing small modular units. It just doesn't seem to be working.

A possible test would be: a few basic dummy pipelines that execute something. Run these on some demo datasets, with some expected failures, some re-runs, etc. Basically, the kind of use that a person would actually use looper to do.

@nsheff
Copy link
Contributor Author

nsheff commented Feb 16, 2024

One idea would be to have a test that clones the hello_looper repository and then runs these example pipelines.

Or something like that. I think we did this for peppy, having it use the example_peps repo.

@nsheff nsheff added this to the v2.0.0 milestone Feb 16, 2024
@donaldcampbelljr
Copy link
Contributor

Ok, I've started a branch to work on this. I have a skeleton to have two large tests for using looper with and without pipestat integration. I'll look at the peppy example and the pypiper unit test for inspiration as I continue.

@donaldcampbelljr
Copy link
Contributor

I have a proof of concept going where I can clone the hello_looper repository into a tempdir and execute the pipestat configured run successfully.

@donaldcampbelljr
Copy link
Contributor

One issue I've run into when using Looper with Pipestat is that I would clone the hello_looper repo and run the command:
looper run --looper-config /tmp/tmp5h3_if__/pipestat/.looper_pipestat_shell.yaml

Pytest is running the command from a directory that is not the pipestat folder. I can manually reproduce this.

Issue: Pointing to the looper config file works for everything except finding the samples. Looper complains it cannot find:

wc: data/frog2_data.txt: No such file or directory

This is because the samples.csv file tells looper to look in the data folder relative to the location of the .looper.yaml file.

One option is to change the pipeline interface such that (see asterisks below):

command_template: >
  {looper.piface_dir}/count_lines_pipestat.sh **{sample.file}** {sample.sample_name} {pipestat.config_file}

becomes something like:

command_template: >
  {looper.piface_dir}/count_lines_pipestat.sh **{looper.data_directory}{sample.file}** {sample.sample_name} {pipestat.config_file}

In reality I know the data directory is relative to to the config file, so I could just use the config file directory when writing the interface. However, I don't see this as part of the looper namespace that is built for this purpose.

It seems like the proper solution is that Looper should get the absolute path to the data file which is does not. Here is an example submission script that exhibits the issue.

{
/tmp/tmp5h3_if__/pipestat/./pipeline_pipestat/count_lines_pipestat.sh 
data/frog1_data.txt 
frog_1 
/tmp/tmp5h3_if__/pipestat/looper_pipestat_config.yaml 
} | tee /tmp/tmp5h3_if__/pipestat/./results/submission/example_pipestat_pipeline_frog_1.log

@donaldcampbelljr
Copy link
Contributor

Official documentation on PEP suggests derived columns to eliminate paths from the sample table:
https://pep.databio.org/spec/howto-eliminate-paths/

So my above solution is probably not actually what we want.

@donaldcampbelljr
Copy link
Contributor

Ok, I refactored the pipestat hello_looper example (branch dev_derive) to use derived attributes: pepkit/hello_looper@0927058

Then, in looper pytest, I open the project config and replace the source1 path with the new temp_dir path:
d4cfe23

This (sort of) simulates the user using an environment variable to point to the data.

This now appears to be working in pytest as desired AND the hello_looper examples works exactly the same as before from a user perspective.

@donaldcampbelljr
Copy link
Contributor

donaldcampbelljr commented Mar 20, 2024

  • Merge hello_looper changes to dev branch before releasing v1.8.0

@donaldcampbelljr
Copy link
Contributor

Refactoring of all tests is now complete and PEPs pull from the hello-looper dev branch. I've also added comprehensive tests, but, as discussed, these will continue to be a WIP as we add complexity to these comprehensive tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants