Run cellfinder benchmarks on small data #94

sfmig · 2024-04-22T16:24:36Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
We are exploring a systematic way to benchmark brainglobe workflows using asv.

This PR fixes some issues running the cellfinder workflow benchmarks (1) on a small GIN dataset and (2) on data available locally.

What does this PR do?
This PR involves:

edits on the asv config file (mainly to install in the asv environment the brainglobe-workflows package from the local repo),
an update to the setup_cache function of the benchmarks,
an option to run the benchmarks on locally available data using an environment variable, and
edits to the readme file to reflect these updates.

To run the benchmarks locally on small dataset from GIN

Checkout this branch to get the latest version of the benchmarks locally.
Create a conda environment and pip install asv:
```
conda create -n asv-check python=3.10
conda activate asv-check
pip install asv
```
Note that to run the benchmarks you do not need to install a development version of brainglobe-workflows, since asv will create a separate Python virtual environment to run the benchmarks on it. However, for convenience we do include asv as part of the dev dependencies, so you can use a dev environment to run benchmarks.
For a quick check, run one iteration per benchmark with
```
asv run -q
```
- You can add -v --show-stderr for a more verbose output.
- This will install in the asv virtual environment the brainglobe-workflows package from the tip of the local currently checked out branch, and run the (locally defined) benchmarks on it.

To run the benchmarks (locally) on a locally available dataset

Define a config file for the workflow to benchmark. You can use the default one at brainglobe_workflows/configs/cellfinder.json for reference.
- Ensure your config file includes an input_data_dir field pointing to the data of interest.
- Edit the names of the signal and background directories if required. By default, they are assumed to be in signal and background subdirectories under input_data_dir. However, these defaults can be overwritten with the signal_subdir and background_subdir fields.
Create and activate an environment with asv (follow steps 1 and 2 from above).
Run the benchmarks in "quick mode", passing the path to your config file as an environment variable CONFIG_PATH. In Unix systems:
```
CONFIG_PATH=/path/to/your/config/file asv run -q
```

Troubleshooting

You may find that the conda environment creation is failing because of this issue. This seems to be because asv is assuming a conda syntax that changed with the latest release (in conda 24.3.0 --force became --yes).

A PR is on the way, as a temporary workaround you can try from base conda install -y "conda<24.3".

References

See issue #9.

Also related is issue #98 which I am currently investigating.

Further context

We currently have asv benchmarks for the three main steps involved in the cellfinder workflow:

reading input data,
detecting and classifying cells, and
saving the results to file.

We also have a benchmark for the full workflow.

We envisioned benchmarks being useful to developers in 3 main ways:

Developers can run the available benchmarks locally on a small test dataset fetched from GIN. For this, the cellfinder workflow is run with the default config that ships with the package (at brainglobe_workflows/configs/cellfinder.json).
Developers can also run these benchmarks on data they have stored locally. For this, the workflow is run with a custom config, whose path is passed to the benchmarks as an environment variable.
We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available. This is not yet implemented.
This is all explained in the README.

A reminder of how asv works:

asv creates a virtual environment where it installs the package to be benchmarked (in our case, brainglobe-workflows). This virtual environment is defined in the asv config file (asv.conf.json).
We set asv so that the version of brainglobe-workflows that is installed in the asv-managed virtual environment is the one at the tip of the currently checked out branch (i.e., the version at HEAD). This way developers can check if their local branch introduces regressions. Alternatively, we can choose to install a version of brainglobe-workflows fetched from Github (for example, the tip of the remote main branch).
asv will look for benchmarks under the benchmarks folder (which is at the same level as the asv.conf.json file), and run them.

How has this PR been tested?

The benchmarks are checked with a CI job, rather than with explicit tests. This follows the general approach in the field - see #96 for more details.

Since we don't plan to test the benchmarks with pytest, I omitted the benchmarks from coverage.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

The README has been updated to better reflect the current status.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality (unit & integration) -- this is covered in PR Check benchmarks on CI #96
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

codecov · 2024-04-22T16:55:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.45%. Comparing base (b5f62ef) to head (f800053).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
+ Coverage   79.38%   84.45%   +5.06%     
==========================================
  Files          18       17       -1     
  Lines         917      862      -55     
==========================================
  Hits          728      728              
+ Misses        189      134      -55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alessandrofelder

Happy with this - just one tiny suggestion.

benchmarks/cellfinder_core.py

Co-authored-by: Alessandro Felder <[email protected]>

This was referenced Apr 22, 2024

Rename cellfinder_core to cellfinder #95

Merged

Asv benchmarks for cellfinder workflow #18

Closed

Check benchmarks on CI #96

Merged

sfmig marked this pull request as ready for review April 25, 2024 11:53

sfmig requested a review from a team April 25, 2024 11:54

alessandrofelder approved these changes Apr 25, 2024

View reviewed changes

benchmarks/cellfinder_core.py Outdated Show resolved Hide resolved

sfmig and others added 18 commits April 30, 2024 10:52

update asv config

404caee

remove GIN download from setup_cache (it's done in CellfinderConfig now)

50302ce

fix manifest

c6ad923

attempt to omit coverage of benchmarks

0ea273a

add cellfinder model download to setup_cache

865f49b

rename benchmarks

feeda6a

remove cellfinder-core from asv dependencies

e68e9f1

update script to reflect current status

b9ca43d

fix asv project path

650791c

cosmetic changes to readme

532ee3f

add environment variable to point to a local path

d589377

update readme for env variable option

c7b4dab

point to tip of locally checked out branch by default

f8b097b

update readme

d6867b6

remove pooch from asv and dev dependencies

3f2a5c0

exclude html and put excludes together

9a7f404

Update benchmarks/cellfinder_core.py

ec4a59c

Co-authored-by: Alessandro Felder <[email protected]>

fix ruff

f800053

sfmig force-pushed the smg/cellfinder-benchmarks-small branch from fdc29fa to f800053 Compare April 30, 2024 09:53

sfmig merged commit 34b07ec into main Apr 30, 2024
11 checks passed

sfmig deleted the smg/cellfinder-benchmarks-small branch April 30, 2024 14:41

sfmig mentioned this pull request May 2, 2024

Add benchmarks for the main function [Feature] #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run cellfinder benchmarks on small data #94

Run cellfinder benchmarks on small data #94

sfmig commented Apr 22, 2024 •

edited

Loading

codecov bot commented Apr 22, 2024 •

edited

Loading

alessandrofelder left a comment

Run cellfinder benchmarks on small data #94

Run cellfinder benchmarks on small data #94

Conversation

sfmig commented Apr 22, 2024 • edited Loading

Description

To run the benchmarks locally on small dataset from GIN

To run the benchmarks (locally) on a locally available dataset

Troubleshooting

References

Further context

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

codecov bot commented Apr 22, 2024 • edited Loading

Codecov Report

alessandrofelder left a comment

Choose a reason for hiding this comment

sfmig commented Apr 22, 2024 •

edited

Loading

codecov bot commented Apr 22, 2024 •

edited

Loading