Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run cellfinder benchmarks on small data #94

Merged
merged 18 commits into from
Apr 30, 2024
Merged

Conversation

sfmig
Copy link
Contributor

@sfmig sfmig commented Apr 22, 2024

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
We are exploring a systematic way to benchmark brainglobe workflows using asv.

This PR fixes some issues running the cellfinder workflow benchmarks (1) on a small GIN dataset and (2) on data available locally.

What does this PR do?
This PR involves:

  • edits on the asv config file (mainly to install in the asv environment the brainglobe-workflows package from the local repo),
  • an update to the setup_cache function of the benchmarks,
  • an option to run the benchmarks on locally available data using an environment variable, and
  • edits to the readme file to reflect these updates.

To run the benchmarks locally on small dataset from GIN

  1. Checkout this branch to get the latest version of the benchmarks locally.

  2. Create a conda environment and pip install asv:

    conda create -n asv-check python=3.10
    conda activate asv-check
    pip install asv
    

    Note that to run the benchmarks you do not need to install a development version of brainglobe-workflows, since asv will create a separate Python virtual environment to run the benchmarks on it. However, for convenience we do include asv as part of the dev dependencies, so you can use a dev environment to run benchmarks.

  3. For a quick check, run one iteration per benchmark with

    asv run -q
    
    • You can add -v --show-stderr for a more verbose output.
    • This will install in the asv virtual environment the brainglobe-workflows package from the tip of the local currently checked out branch, and run the (locally defined) benchmarks on it.

To run the benchmarks (locally) on a locally available dataset

  1. Define a config file for the workflow to benchmark. You can use the default one at brainglobe_workflows/configs/cellfinder.json for reference.

    • Ensure your config file includes an input_data_dir field pointing to the data of interest.
    • Edit the names of the signal and background directories if required. By default, they are assumed to be in signal and background subdirectories under input_data_dir. However, these defaults can be overwritten with the signal_subdir and background_subdir fields.
  2. Create and activate an environment with asv (follow steps 1 and 2 from above).

  3. Run the benchmarks in "quick mode", passing the path to your config file as an environment variable CONFIG_PATH. In Unix systems:

    CONFIG_PATH=/path/to/your/config/file asv run -q
    

Troubleshooting

You may find that the conda environment creation is failing because of this issue. This seems to be because asv is assuming a conda syntax that changed with the latest release (in conda 24.3.0 --force became --yes).

A PR is on the way, as a temporary workaround you can try from base conda install -y "conda<24.3".

References

See issue #9.

Also related is issue #98 which I am currently investigating.

Further context

We currently have asv benchmarks for the three main steps involved in the cellfinder workflow:

  • reading input data,
  • detecting and classifying cells, and
  • saving the results to file.

We also have a benchmark for the full workflow.

We envisioned benchmarks being useful to developers in 3 main ways:

  • Developers can run the available benchmarks locally on a small test dataset fetched from GIN. For this, the cellfinder workflow is run with the default config that ships with the package (at brainglobe_workflows/configs/cellfinder.json).
  • Developers can also run these benchmarks on data they have stored locally. For this, the workflow is run with a custom config, whose path is passed to the benchmarks as an environment variable.
  • We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available. This is not yet implemented.
    This is all explained in the README.

A reminder of how asv works:

  • asv creates a virtual environment where it installs the package to be benchmarked (in our case, brainglobe-workflows). This virtual environment is defined in the asv config file (asv.conf.json).
  • We set asv so that the version of brainglobe-workflows that is installed in the asv-managed virtual environment is the one at the tip of the currently checked out branch (i.e., the version at HEAD). This way developers can check if their local branch introduces regressions. Alternatively, we can choose to install a version of brainglobe-workflows fetched from Github (for example, the tip of the remote main branch).
  • asv will look for benchmarks under the benchmarks folder (which is at the same level as the asv.conf.json file), and run them.

How has this PR been tested?

The benchmarks are checked with a CI job, rather than with explicit tests. This follows the general approach in the field - see #96 for more details.

Since we don't plan to test the benchmarks with pytest, I omitted the benchmarks from coverage.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

The README has been updated to better reflect the current status.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality (unit & integration) -- this is covered in PR Check benchmarks on CI #96
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.45%. Comparing base (b5f62ef) to head (f800053).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
+ Coverage   79.38%   84.45%   +5.06%     
==========================================
  Files          18       17       -1     
  Lines         917      862      -55     
==========================================
  Hits          728      728              
+ Misses        189      134      -55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sfmig sfmig marked this pull request as ready for review April 25, 2024 11:53
@sfmig sfmig requested a review from a team April 25, 2024 11:54
Copy link
Member

@alessandrofelder alessandrofelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this - just one tiny suggestion.

benchmarks/cellfinder_core.py Outdated Show resolved Hide resolved
@sfmig sfmig force-pushed the smg/cellfinder-benchmarks-small branch from fdc29fa to f800053 Compare April 30, 2024 09:53
@sfmig sfmig merged commit 34b07ec into main Apr 30, 2024
11 checks passed
@sfmig sfmig deleted the smg/cellfinder-benchmarks-small branch April 30, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants