Updating the test data for tools/cellpose #1495

SaimMomin12 · 2024-09-02T09:29:10Z

This pull request updates the test-data for cellpose tool for an existing PR #1494 and fixes #1494

SaimMomin12 · 2024-09-02T09:51:58Z

For the tests here, should we keep the delta_frac high?

bgruening · 2024-09-02T10:17:57Z

Lets ask @kostrykin - my guess is we should switch to asserts. This would hopefully in less work in the next bot PR.

kostrykin · 2024-09-02T13:23:47Z

Yes, using delta_frac to verify the result image is not reliable.

I'd suggest to use the img_diff comparison method https://docs.galaxyproject.org/en/master/dev/schema.html#tool-tests-test-output, because there are composite images involved (i.e. figures with multiple images). The assertions are not suitable for that kind of images.

For the other images, using assertions should be possible, but since you have the expected data generated already, I think using img_diff is more reliable and easier.

SaimMomin12 · 2024-09-02T14:48:49Z

@kostrykin Thanks for the pointers. I will replace the tests with image_diff comparison and check if the tests pass.

SaimMomin12 · 2024-09-03T11:01:15Z

Using image_diff did not help in this case. We might have to think about some other alternative.

kostrykin · 2024-09-03T11:38:58Z

As I see from the test results https://github.com/bgruening/galaxytools/actions/runs/10681249964?pr=1495, the issue is that the difference between the images generated by Cellpose on the one hand, and the expected images on the other hand, is simply too large. This is not an issue of testing, but of Cellpose generating unexpected results. The question is, why is that happening?

I'm trying to inspect the generated data, but it gets deleted even though I've used --no_cleanup for planemo test. Any idea how to get those?

kostrykin · 2024-09-03T12:54:40Z

Ok I managed to generate the results using --update_test_data… kind of a workaround, but anyways…

The issue is that the generated results indeed are very different than the expected results.

Here is the expected image for img02_cp_masks_cyto.tif:

And here is what Cellpose generated for img02_cp_masks_cyto.tif:

SaimMomin12 · 2024-09-04T08:38:30Z

@kostrykin this behaviour is quite strange. Are you using --biocontainers in your planemo test command?

kostrykin · 2024-09-04T11:48:45Z

@kostrykin this behaviour is quite strange. Are you using --biocontainers in your planemo test command?

It takes forever to run with --biocontainers like several hours, so I omitted it at first.

Interestingly, Test 1 fails when running with --biocontainers and --update_test_data:

Problem:
Expecting value: line 1 column 1 (char 0)
Command line:
export CELLPOSE_LOCAL_MODELS_PATH='cellpose_models' && mkdir -p segmentation && ln -s '/tmp/tmpgzgkjdjl/files/a/e/b/dataset_aebc8958-c288-417a-b3e7-b8b78370474d.dat' ./image.png && python '/home/void/Documents/SaimMomin12-galaxytools/tools/cellpose/cp_segmentation.py' --inputs '/tmp/tmpgzgkjdjl/job_working_directory/000/2/configs/tmprb58hfv0' --img_path ./image.png --img_format 'png' --output_dir ./segmentation

The output of Cellpose for img02_cp_masks_cyto.tif is also different:

So we have 3 different outputs, one for the expected data, one for without --biocontainers, and one another for with --biocontainers. I have absolutely no idea what is going on here. How did you generate the output for the expected data?

SaimMomin12 · 2024-09-04T12:03:39Z

@kostrykin Ideally the tools needs to be tested with --biocontainers as this simulates the exact behaviour of Planemo run on GitHub.

I updated the test data with the following command
planemo t cellpose.xml --galaxy_root /home/momins/galaxy/ --biocontainers --update_test_data

kostrykin · 2024-09-04T14:15:57Z

@kostrykin Ideally the tools needs to be tested with --biocontainers as this simulates the exact behaviour of Planemo run on GitHub.

I updated the test data with the following command planemo t cellpose.xml --galaxy_root /home/momins/galaxy/ --biocontainers --update_test_data

I did planemo test --update_test_data --biocontainers.

Test 1 fail to run with the above mentioned error. I attached the other generated results. test-data.tgz

Do you obtain the same results when you re-generate them?

SaimMomin12 · 2024-09-04T14:38:55Z

The *.tif files are similar, but the generated PNGs are different from yours.

kostrykin · 2024-09-05T08:50:31Z

I just tested the base branch on my computer.

Using --biocontainers still is horribly slow, but faster than with the new version in this PR (like just 1-2 hours instead of 2-3). Moreover, every test passes when using --update_test_data.

For the img02_cp_masks_cyto.tif file, I again get different results than the expected image in the test-data directory. But, the file has the same file size! Since the tests used file size comparison before, I suspect the the issue that we are facing here was already present in the base branch version.

In consequence, since the base branch version is faster, does not exhibit failing tests, and to better understand what is going on, I suggest the following strategy:

First migrate the tests of the base branch to img_diff. In the course of this, we should investigate whether we get persistent results in CI and locally. Fix everything that needs to be fixed.
Then re-do this PR against the fixed base branch.

SaimMomin12 · 2024-09-06T07:34:08Z

@kostrykin I will first migrate the changes of image_diff to the base branch and see if all tests pass and then compare the results of CI and locally. For testing this, we update the tool version or retain the same version?

kostrykin · 2024-09-06T14:42:03Z

@kostrykin I will first migrate the changes of image_diff to the base branch and see if all tests pass and then compare the results of CI and locally. For testing this, we update the tool version or retain the same version?

I think we keep the TOOL_VERSION. Possibly it would make sense to increment the VERSION_SUFFIX.

SaimMomin12 · 2024-09-19T10:09:18Z

@kostrykin I was testing the base branch of cellpose and migrated the image_diff changes to it in my last commit. The tests fail here as well due to different images being generated during tests in CI.

You are correct that all the tests of the tool pass when --update_test_data is used locally, but they do fail here in CI. I am not so sure know why is this happening. Did you find anything similar?

kostrykin · 2024-09-19T12:25:31Z

Nope, unfortunately, I haven't seen similar issues before. Basically there are two options, I guess: Either a dependency of the Cellpose package is resolved to different versions and this induces the diverging behavior. Or it is some internal switch of Cellpose, that causes the divergence. Are the results consistent when running locally with CPU vs. GPU utilization?

Maybe those who implemented/updated the Cellpose wrapper have an idea? @sunyi000 @pavanvidem

bgruening · 2024-09-22T10:30:34Z

Here are the updated test-data with docker: #1505

sunyi000 · 2024-09-24T13:02:08Z

Nope, unfortunately, I haven't seen similar issues before. Basically there are two options, I guess: Either a dependency of the Cellpose package is resolved to different versions and this induces the diverging behavior. Or it is some internal switch of Cellpose, that causes the divergence. Are the results consistent when running locally with CPU vs. GPU utilization?

Maybe those who implemented/updated the Cellpose wrapper have an idea? @sunyi000 @pavanvidem

sorry i have no idea why it's producing different results.
shall we try not using the py script anymore? the python script was developed for cellpose2 long time ago..
I could give it a try. @kostrykin @SaimMomin12

pavanvidem · 2024-09-24T13:09:03Z

As we are now using conda requirements, differences maybe due to some dependency. Let me test this.

bgruening · 2024-10-29T14:22:10Z

What should we do here? Do we assume cellpose knows what they are doing and accept that the outputs have changed so dramatically?

We all use biocontainers, hopefully, we should not use conda for testing, locally or remote only biocontainers should be used.

SaimMomin12 · 2024-11-11T10:08:56Z

What should we do here? Do we assume cellpose knows what they are doing and accept that the outputs have changed so dramatically?

We all use biocontainers, hopefully, we should not use conda for testing, locally or remote only biocontainers should be used.

ping @kostrykin

kostrykin · 2024-11-12T09:03:29Z

What should we do here? Do we assume cellpose knows what they are doing and accept that the outputs have changed so dramatically?

@bgruening If I understood this comment correctly, then the problem here is that the Cellpose wrapper for one and the same version of Cellpose produces very different results. This makes me concerned regarding reproducibility. If the wrapper now yields different results when being run locally vs. in CI, it might as well yield inconsistent results in the future when run in Galaxy, as long as we do not understand why the inconsistency happens.

@SaimMomin12 You reported the diverging results when (i) using planemo test --update_test_data in comparison to (ii) the results obtained by the CI. Have you been using --biocontainers for (i)? If not, does this issue still persist when using --biocontainers, or are the results consistent with those obtained by the CI then?

IMO, if and only if the answer to this last question is "yes", we can ignore the inconsistency.

kostrykin · 2024-11-12T09:03:35Z

As we are now using conda requirements, differences maybe due to some dependency. Let me test this.

@pavanvidem Did you find out something?

bgruening · 2024-11-12T09:30:36Z

@bgruening If I understood #1495 (comment) comment correctly, then the problem here is that the Cellpose wrapper for one and the same version of Cellpose produces very different results. This makes me concerned regarding reproducibility. If the wrapper now yields different results when being run locally vs. in CI, it might as well yield inconsistent results in the future when run in Galaxy, as long as we do not understand why the inconsistency happens.

I'm not sure we should assume that. The new tests are stricter and we don't know for sure how it was with the old version + old tests, correct?

To make it easier for all of us, we should always use --biocontainers. There is just one problem, also the --biocontainer is created on the fly and depending on which time it was created can have different dependencies. There are two workarounds:

add more requirements to the tool
create a container for this PR already in https://github.com/BioContainers/multi-package-containers/ then this PR will use this rebuild image and we are using all the same image (in contrast to using a on-the-fly build one)

sunyi000 · 2024-11-12T10:11:22Z

I could try rework on this tool.. I was trying to use this in one of my testing workflow, it didn't work.
Since the python code was developed with cellpose2, I was thinking of removing it and re-do the tool with pure cellpose3 headless options

pavanvidem · 2024-11-13T14:59:41Z

After trying out many possible combinations of dependencies, I found out that the discrepancies are most likely coming from opencv library. If we use pip installation of opencv-python-headless, the results are consistent. The container-based tool v3.0.8 produces the same image every time. The conda-based tool which uses opencv conda package produces a slightly different result for every run.

Surprisingly, the results from v3.0.9 and v3.0.10 containers are also not reproducible. I suspect they are also using opencv conda package but I couldn't locate the Dockerfile. Can someone please point me to the Dockerfile?

SaimMomin12 mentioned this pull request Sep 2, 2024

Updating tools/cellpose from version 3.0.10 to 3.0.11 #1494

Closed

SaimMomin12 marked this pull request as draft September 17, 2024 08:34

planemo-autoupdate and others added 5 commits September 18, 2024 13:04

Updating tools/cellpose from version 3.0.10 to 3.0.11

ffac25d

Updated test data

34aef8f

Updated test pararmeters

80709fa

test fix

f1b0a60

Changed test to image diff of base branch to see the CI behaviour

cdd5be1

SaimMomin12 force-pushed the tools/cellpose branch from 830bcf8 to cdd5be1 Compare September 18, 2024 11:04

SaimMomin12 mentioned this pull request Oct 10, 2024

Saim momin12 tools/cellpose #1505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the test data for tools/cellpose #1495

Updating the test data for tools/cellpose #1495

SaimMomin12 commented Sep 2, 2024

SaimMomin12 commented Sep 2, 2024

bgruening commented Sep 2, 2024

kostrykin commented Sep 2, 2024

SaimMomin12 commented Sep 2, 2024

SaimMomin12 commented Sep 3, 2024

kostrykin commented Sep 3, 2024 •

edited

Loading

kostrykin commented Sep 3, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 4, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 4, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 5, 2024

SaimMomin12 commented Sep 6, 2024

kostrykin commented Sep 6, 2024

SaimMomin12 commented Sep 19, 2024

kostrykin commented Sep 19, 2024

bgruening commented Sep 22, 2024

sunyi000 commented Sep 24, 2024

pavanvidem commented Sep 24, 2024

bgruening commented Oct 29, 2024

SaimMomin12 commented Nov 11, 2024

kostrykin commented Nov 12, 2024

kostrykin commented Nov 12, 2024

bgruening commented Nov 12, 2024

sunyi000 commented Nov 12, 2024 •

edited

Loading

pavanvidem commented Nov 13, 2024

Updating the test data for tools/cellpose #1495

Are you sure you want to change the base?

Updating the test data for tools/cellpose #1495

Conversation

SaimMomin12 commented Sep 2, 2024

SaimMomin12 commented Sep 2, 2024

bgruening commented Sep 2, 2024

kostrykin commented Sep 2, 2024

SaimMomin12 commented Sep 2, 2024

SaimMomin12 commented Sep 3, 2024

kostrykin commented Sep 3, 2024 • edited Loading

kostrykin commented Sep 3, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 4, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 4, 2024

SaimMomin12 commented Sep 4, 2024

kostrykin commented Sep 5, 2024

SaimMomin12 commented Sep 6, 2024

kostrykin commented Sep 6, 2024

SaimMomin12 commented Sep 19, 2024

kostrykin commented Sep 19, 2024

bgruening commented Sep 22, 2024

sunyi000 commented Sep 24, 2024

pavanvidem commented Sep 24, 2024

bgruening commented Oct 29, 2024

SaimMomin12 commented Nov 11, 2024

kostrykin commented Nov 12, 2024

kostrykin commented Nov 12, 2024

bgruening commented Nov 12, 2024

sunyi000 commented Nov 12, 2024 • edited Loading

pavanvidem commented Nov 13, 2024

kostrykin commented Sep 3, 2024 •

edited

Loading

sunyi000 commented Nov 12, 2024 •

edited

Loading