Skip to content

Commit

Permalink
Merge pull request #111 from coralnet/extractors-rearrange-modules
Browse files Browse the repository at this point in the history
Feature extractor refactoring + more granular test-related config
  • Loading branch information
StephenChan authored Dec 3, 2024
2 parents 290f915 + 7bb1128 commit 87c6818
Show file tree
Hide file tree
Showing 29 changed files with 743 additions and 686 deletions.
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog

## 0.11.0 (WIP)

- Feature extractor class changes:

- `FeatureExtractor` and its built-in subclasses should now be imported like `from spacer.extractors import <class>` instead of `from spacer.extract_features import <class>`.

- High-level usage of `FeatureExtractor` instances is the same as before - invoking `__call__()` performs feature extraction on an image. However, subclass implementations should now generally define a `patches_to_features()` method instead of overriding `__call__()`.

- There is now a `TorchExtractor` class which has details that are specific to PyTorch but not to EfficientNet. So, it's suitable as a starting point for a custom PyTorch extractor that uses another type of network. `EfficientNetExtractor` now inherits from TorchExtractor.

- There are now `CROP_SIZE` and `BATCH_SIZE` class-level variables available.

- Config and test changes:

- Some former usages of `TEST_BUCKET` have been changed to `CN_FIXTURES_BUCKET`, to more clearly denote test fixtures that are currently only available to CoralNet devs.

- The remaining usages of `TEST_BUCKET` are now usable by anyone with an AWS account. This can be any S3 bucket that you have read and write access to.

- `TEST_EXTRACTORS_BUCKET` is now known as `CN_TEST_EXTRACTORS_BUCKET`, again denoting fixtures currently only available to CoralNet devs.

Related to these changes, now more tests are runnable without needing CoralNet AWS credentials. More tests are runnable in GitHub Actions CI, as well (even though that doesn't use AWS at all).

## 0.10.0

- AWS credentials can now be obtained through the following methods, in addition to spacer config values as before:
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ The `tasks.py` module has four functions which comprise the main interface of py

The first step when analyzing an image, or preparing an image as training data, is extracting [features](https://en.wikipedia.org/wiki/Feature_(computer_vision)) from the image. For this step, you specify a set of points (pixel locations) in the image which you want to analyze. At each point, spacer will crop a square of pixels centered around that location and extract features based on that square.

You'll also need a feature extractor, but spacer does not provide one out of the box. Spacer's `extract_features.py` provides the Python classes `EfficientNetExtractor` for loading EfficientNet extractors in PyTorch format (CoralNet 1.0's default extraction scheme), and `VGG16CaffeExtractor` for loading VGG16 extractors in Caffe format (CoralNet's legacy extraction scheme).
You'll also need a feature extractor, but spacer does not provide one out of the box. `spacer/extractors` includes the Python classes `EfficientNetExtractor` for loading EfficientNet extractors in PyTorch format (CoralNet 1.0's default extraction scheme), and `VGG16CaffeExtractor` for loading VGG16 extractors in Caffe format (CoralNet's legacy extraction scheme).

You'll either want to match one of these schemes so you can use the provided classes, or you'll have to write your own extractor class which inherits from the base class `FeatureExtractor`. Between the provided classes, the easier one to use will probably be `EfficientNetExtractor`, because Caffe is old software which is more complicated to install.

Expand All @@ -120,7 +120,7 @@ If you're loading the extractor files remotely (from S3 or from a URL), the file
The output of `extract_features()` is a single feature-vector file, which is a JSON file that is deserializable using the `data_classes.ImageFeatures` class. Example usage:

```python
from spacer.extract_features import EfficientNetExtractor
from spacer.extractors import EfficientNetExtractor
from spacer.messages import DataLocation, ExtractFeaturesMsg
from spacer.tasks import extract_features

Expand Down Expand Up @@ -369,7 +369,7 @@ This basically does `extract_features` and `classify_features` together in one g
Takes an image, a list of pixel locations on that image, a feature extractor, and a classifier. Produces prediction results (scores) for the image points, as posterior probabilities for each class. Example:

```python
from spacer.extract_features import EfficientNetExtractor
from spacer.extractors import EfficientNetExtractor
from spacer.messages import DataLocation, ClassifyImageMsg
from spacer.tasks import classify_image

Expand Down Expand Up @@ -406,7 +406,7 @@ for row, col, scores in return_message.scores:

If you are using the docker build or local install, you can run the test suite by running `python -m unittest` from the `spacer` directory.

- Expect many tests to be skipped, since most test fixtures aren't set up for public access yet.
- Some tests require Amazon S3 config, Caffe installation, and/or CoralNet infrastructure access. The applicable tests will be skipped if your config doesn't support them.

- Run just a single test module with a command like `python -m unittest tests.test_tasks`, or just `python -m tests.test_tasks` (the latter invokes the `if __name__ == '__main__':` part of the module).

Expand Down
132 changes: 0 additions & 132 deletions spacer/caffe_utils.py

This file was deleted.

32 changes: 17 additions & 15 deletions spacer/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,21 +258,21 @@ def __exit__(self, exc_type, exc, exc_tb):
'minibatch'
]

# For extractors used in unit tests.
TEST_EXTRACTORS_BUCKET = get_config_value(
'TEST_EXTRACTORS_BUCKET', default=None)
# For other fixtures used in unit tests.
#
# At least for now, the main reason these bucket names are pulled from
# config is to not expose the bucket names used by the PySpacer core devs.
# However, since these test files are not publicly linked and need to
# live in an S3 bucket with specific filenames (specified by TEST_EXTRACTORS
# and individual tests), the tests are still onerous to set up for anyone
# besides the core devs. This should be addressed sometime.
# Amazon S3 bucket for temporarily storing data during unit tests.
# You'll need write access to this bucket to run the applicable tests.
TEST_BUCKET = get_config_value('TEST_BUCKET', default=None)
# A few other fixtures live here.
# A few testing fixtures live here.
LOCAL_FIXTURE_DIR = str(APP_DIR / 'tests' / 'fixtures')

# And the rest of the testing fixtures live in these CoralNet-owned
# private buckets. (CoralNet devs should specify the names of the buckets
# in their environment.)
# These tests and fixtures should be reorganized sometime so that anyone can
# run the applicable tests.
CN_TEST_EXTRACTORS_BUCKET = get_config_value(
'CN_TEST_EXTRACTORS_BUCKET', default=None)
CN_FIXTURES_BUCKET = get_config_value('CN_FIXTURES_BUCKET', default=None)

STORAGE_TYPES = [
's3',
'filesystem',
Expand Down Expand Up @@ -315,10 +315,12 @@ def __exit__(self, exc_type, exc, exc_tb):
# This is required if you're loading feature extractors from a remote
# source (S3 or URL).
'EXTRACTORS_CACHE_DIR',
# These are required to run certain unit tests. They're also not really
# usable by anyone besides spacer's core devs at the moment.
'TEST_EXTRACTORS_BUCKET',
# This is required for S3 unit tests.
'TEST_BUCKET',
# These are required to run certain unit tests. They're also only usable
# by CoralNet devs at the moment.
'CN_TEST_EXTRACTORS_BUCKET',
'CN_FIXTURES_BUCKET',
# These can just be configured as needed, or left as defaults.
'LOG_DESTINATION',
'LOG_LEVEL',
Expand Down
14 changes: 14 additions & 0 deletions spacer/extractors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
"""
For possible future network extension
"""
from .base import DummyExtractor, FeatureExtractor
from .efficientnet import EfficientNetExtractor
from .vgg16 import VGG16CaffeExtractor


__all__ = [
'DummyExtractor',
'EfficientNetExtractor',
'FeatureExtractor',
'VGG16CaffeExtractor',
]
Loading

0 comments on commit 87c6818

Please sign in to comment.