Skip to content

Commit

Permalink
update autolamella ml page
Browse files Browse the repository at this point in the history
  • Loading branch information
patrickcleeve2 committed Jan 14, 2024
1 parent db0e1cb commit 088d84b
Show file tree
Hide file tree
Showing 11 changed files with 213 additions and 44 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
docs/autolamella/assets/manual_labelling.webm
docs/autolamella/assets/model_assisted_labelling.webm
docs/autolamella/assets/sam_assisted_labelling.webm
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/autolamella/assets/show_liftout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/autolamella/assets/show_serial_liftout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/autolamella/assets/show_waffle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 0 additions & 13 deletions docs/autolamella/case_study_dataset_and_models.md

This file was deleted.

196 changes: 196 additions & 0 deletions docs/autolamella/ml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Machine Learning

Machine Learning is a core part of AutoLamella. Here we are highlighting the datasets, models and tooling we have developed as part of the project.

## Dataset

### AutoLamella Dataset

The autolamella dataset consists of images from multiple different lamella preparation methods. All data is annotated for semantic segmentation, and is available through the huggingface api at [patrickcleeve/autolamella](https://huggingface.co/datasets/patrickcleeve/autolamella)

Summary

| Dataset / Method | Train | Test | Total |
| ----------- | ----------- | -----------| -----------|
| Waffle | 214 | 76 | 290 |
| Liftout | 801 | 168 | 969 |
| Serial Liftout | 301 | 111 | 412 |
| **Full** | **1316** | **355** | **1671** |


Details about the datasets can be found in summary.csv in the dataset directory.

### Labels

Currently, the dataset is labelled for the following classes. In the future, we will add additional labels for objects such as ice contamination. If you would like to label this data, please see the labelling tools to get started.

```yaml
CLASS_LABELS: # autolamella
0: "background"
1: "lamella"
2: "manipulator"
3: "landing_post"
4: "copper_adaptor"
5: "volume_block"
```
#### Downloading Data
To download datasets, you can use the huggingface api:
```python

from datasets import load_dataset

# download waffle dataset
ds = load_dataset("patrickcleeve/autolamella", name="waffle")

# download liftout dataset
ds = load_dataset("patrickcleeve/autolamella", name="liftout")

# download serial-liftout dataset
ds = load_dataset("patrickcleeve/autolamella", name="serial-liftout")

# download test split only
ds = load_dataset("patrickcleeve/autolamella", name="waffle", split="test")

```

To display images and annotations:

```python
# show random image image and annotation (training split)
import random
import numpy as np
import matplotlib.pyplot as plt
from fibsem.segmentation.utils import decode_segmap_v2

# random data
idx = random.randint(0, len(ds["train"]))
image = np.asarray(ds["train"][idx]["image"])
mask = np.asarray(ds["train"][idx]["annotation"])

# metadata
split = ds["train"].split
config_name = ds["train"].config_name

plt.title(f"{config_name}-{split}-{idx:02d}")
plt.imshow(image, cmap="gray", alpha=0.7)
plt.imshow(decode_segmap_v2(mask), alpha=0.3)
plt.axis("off")
plt.show()

```

| Waffle | Liftout | Serial Liftout |
| ----------- | ----------- | ----------- |
| ![WaffleData](assets/show_waffle.png) | ![LiftoutData](assets/show_liftout.png) | ![LiftoutData](assets/show_serial_liftout.png) |



### Acknowledgement

- Waffle and Liftout data from Monash
- Serial Liftout data from MPI


### Dataset Format

Image data is 8bit grayscale tiffs, labels are saved as class index maps (8bit tiff). The corresponding image and labels have the same filename, and the labels are located in the labels/ directory.

The file layout of the dataset is:

```bash
autolamella-dataset/
summary.csv
autolamella-waffle/
train/
image-0000.tif
labels/
image-0000.tif
test/
autoliftout/
serial-liftout/
```

### Keypoint Dataset

The keypoint dataset is used for model evaluation, to as closely mimic the online performance of the models when running on an actual FIBSEM.

[Under Construction]

## Models

## Baseline Models

We have trained models for each method on the subset of data. Models are available on huggingface patrickcleeve/openfibsem-baseline. Includes archived development models. You can try out a demo of these models on [huggingface](https://huggingface.co/spaces/patrickcleeve/autolamella-demo).

The current best performing models for each method are:

| Method | Dataset | Checkpoint |
| ----------- | ----------- | -----------|
| AutoLamella Waffle | Waffle | autolamella-waffle-20240107.pt |
| AutoLamella Liftout | Full | autolamella-mega-20240107.pt |
| AutoLamella Serial Liftout | Serial Liftout | autolamella-serial-liftout-20240107.pt |

### Mega Model

We are in the process of developing a combined model for all these methods, called the mega model.

Currently the mega model is outperformed by the specifc waffle, and serial liftout models, but performs better for liftout. My initial thought is that this is because the waffle and serial liftout datasets are from a single sample (and a single run in the serial liftout case), so on these cases the models are probably overfit on these modes. However, the liftout dataset contains multiple samples, and a much more diverse set of images, which benefits more increased size and variation in the dataset.

Therefore, we are very keen to incorporate a lot more varied training data for the other methods. We are looking for additional data to include in this model, if you would like to contribute, please get in contact.

## Tools

This section highlights some of the tools developed.

For specific details about concepts and terminology, please refer to the [Concepts Page](../openfibsem/concepts.md).

### Data Collection and Curation (Supervised Mode)

When running autolamella, imaging data is logged and saved for later use. When running in supervised mode, every correction the user makes is logged and classified. We use this user feedback to automatically split the dataset, and generate a set of evaluation keypoints for testing. This form of active learning allows efficient model improvement by only including data that the model fails to predict correctly.

### Data Labelling (Model Assisted Labelling)

We developed a napari plugin for labelling images for semantic segmentation. The plugins supports manual labelling and model assisted labelling.

Model assisted labelling supports two kinds of model; segmentation models and segment anything models. Segmentation models allow prelabelling of data using a pretrained segmentation model, while segment anything models use a generic segment anything model to prompt segmentation. The user can then manually edit the segmentation to correct for any model errors.

In practice these two models are useful at different stages of development. For example, when labelling a new dataset, or one where the model performance is poor, using segment anything allows you to quickly generate segmentation masks with interactive prompting. Once you have a decent model, segmentation model labelling is much faster, and usually only requires minor edits from the user.

#### Example: Serial Liftout Data

When we received the serial liftout data, we didnt have any models trained for it. Therefore, it was much faster to use SAM to label the training dataset. Once an initial model was trained we switched to using the segmentation model, and manually editing the masks.

![Segment Anything Labelling](assets/sam_assisted_labelling.gif)
Segment Anything Labelling

![Model Assisted Labelling](assets/model_assisted_labelling.gif)
Model Assisted Labelling

With the assistance of this tooling, the full dataset (all methods) now contains around 1600 labelled images.

### Model Evaluation

We provide evaluation tools for evaluating the perform of a number of different models on the keypoint detection task.

The evaluation will run each model checkpoint through the detection pipeline, save the results and compare them to the ground truth labels provided. Each individual image can be plotted, as well as the full evaluation statistics. This evaluation pipeline is useful for checking model improvement and preventing regressions on previously successful tasks.

Here we are showing evaluations from the development models.

[TODO]

## Experimental

### Generating Labels

From the segmentation masks, you can generate other useful labelled objects. The following script will generate bounding boxes and instance segmentations for each object in the mask.

Generate bounding boxes and instance masks from segmentation masks.
generate_segmentation_objects.py

These should be easily converted to coco-format for training.

[Under Construction]
37 changes: 7 additions & 30 deletions docs/openfibsem/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ In order to efficiently improve the model, as well as automatically generate tes

This form of active learning allows us to collect images that the model is currently failing on, and not feed it any more of images that it already succeeds at. This is crucical to balance the dataset, and efficiently improve when the dataset is relative small (~100s of images).

For more details on this approach to data curation, please see [AutoLamella Datasets and Models](../autolamella/case_study_dataset_and_models.md)
For more details on this approach to data curation, please see [AutoLamella Datasets and Models](../autolamella/ml.md)

### Data Labelling

Expand All @@ -282,7 +282,7 @@ We developed a napari plugin for labelling images for semantic segmentation. The
[Image Labelling Napari Plugin]
[Example Labelling GIF for each mode]

For details about how this was used, please see [AutoLamella Datasets and Models](../autolamella/case_study_dataset_and_models.md)
For details about how this was used, please see [AutoLamella Datasets and Models](../autolamella/ml.md)

### Manual Labelling

Expand All @@ -298,37 +298,14 @@ To use, go to the Model tab and load your model, and then tick 'model assisted'

### SegmentAnything Assisted Labelling

We have implemented the Segment Anything Model from MetaAI. This model is trained to segment any object. Here we use it as part of the model assisted labelling. We currently support two versions of SAM; SegmentAnything and MobileSAM.
We have implemented the Segment Anything Model from MetaAI. This model is trained to segment any object. Here we use it as part of the model assisted labelling. We currently support the huggingface transformers implementation for SAM. You can use any compatible SAM model.

SegmentAnything is the original model from MetaAI. It is powerful, but requires a decently large GPU.
MobileSAM is a recent, faster implementation of SAM, that can be run without a large GPU.
The recommended models are:

To use either model, you will need to install some additional dependencies, and download the model weights as described below.
- Large GPU: facebook/sam-vit-base or facebook/sam-vit-large
- Small GPU / CPU: Zigeng/SlimSAM-uniform-50

##### Segment Anything

For more detailed about SAM see: <https://github.com/facebookresearch/segment-anything>

To use SAM:

```python
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install opencv-python pycocotools matplotlib onnxruntime onnx

```

Download weights: [SAM ViT-H](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)

##### MobileSAM

The labelling UI also supports using MobileSAM which is a faster version of Segment Anything (+ less gpu memory).

``` bash
pip install git+https://github.com/ChaoningZhang/MobileSAM.git

```

Download weights: [MobileSAM ViT-T](https://drive.google.com/file/d/1dE-YAG-1mFCBmao2rHDp0n-PP4eH7SjE/view?usp=sharing)
Instructions for using SAM assisted labelling are shown in the side panel.

### Model Training

Expand Down
5 changes: 5 additions & 0 deletions docs/openfibsem/roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Roadmap

## Version 1.0

[Under Construction]
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,14 @@ nav:
- Concepts: openfibsem/concepts.md
- User Guide: openfibsem/user_guide.md
- Examples: openfibsem/examples.md
- Roadmap: openfibsem/roadmap.md
- API Reference: openfibsem/reference.md
- AutoLamella:
- autolamella/index.md
- Getting Started: autolamella/getting_started.md
- Motivation: autolamella/motivation.md
- User Guide: autolamella/user_guide.md
- Case Study - Machine Learning: autolamella/case_study_datasets_and_models.md
- Machine Learning: autolamella/ml.md
- Case Study - Serial Liftout: autolamella/case_study_serial_liftout.md
- Blog:
- blog/index.md

0 comments on commit 088d84b

Please sign in to comment.