update autolamella ml page

DeMarcoLab · Jan 14, 2024 · 088d84b · 088d84b
1 parent db0e1cb
commit 088d84b
Show file tree

Hide file tree

Showing 11 changed files with 213 additions and 44 deletions.
diff --git a/.gitignore b/.gitignore
@@ -158,3 +158,6 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+docs/autolamella/assets/manual_labelling.webm
+docs/autolamella/assets/model_assisted_labelling.webm
+docs/autolamella/assets/sam_assisted_labelling.webm
diff --git a/docs/autolamella/assets/model_assisted_labelling.gif b/docs/autolamella/assets/model_assisted_labelling.gif
diff --git a/docs/autolamella/assets/sam_assisted_labelling.gif b/docs/autolamella/assets/sam_assisted_labelling.gif
diff --git a/docs/autolamella/assets/show_liftout.png b/docs/autolamella/assets/show_liftout.png
diff --git a/docs/autolamella/assets/show_serial_liftout.png b/docs/autolamella/assets/show_serial_liftout.png
diff --git a/docs/autolamella/assets/show_waffle.png b/docs/autolamella/assets/show_waffle.png
diff --git a/docs/autolamella/case_study_dataset_and_models.md b/docs/autolamella/case_study_dataset_and_models.md
diff --git a/docs/autolamella/ml.md b/docs/autolamella/ml.md
@@ -0,0 +1,196 @@
+# Machine Learning
+
+Machine Learning is a core part of AutoLamella. Here we are highlighting the datasets, models and tooling we have developed as part of the project.
+
+## Dataset
+
+### AutoLamella Dataset
+
+The autolamella dataset consists of images from multiple different lamella preparation methods. All data is annotated for semantic segmentation, and is available through the huggingface api at [patrickcleeve/autolamella](https://huggingface.co/datasets/patrickcleeve/autolamella)
+
+Summary
+
+| Dataset / Method      | Train    |   Test   |  Total |
+| ----------- | ----------- | -----------| -----------|
+| Waffle      |    214    |  76   | 290 |
+| Liftout |   801      | 168 | 969 |
+| Serial Liftout |   301      | 111             |  412 |
+| **Full** | **1316** | **355** | **1671** |
+
+
+Details about the datasets can be found in summary.csv in the dataset directory.
+
+### Labels
+
+Currently, the dataset is labelled for the following classes. In the future, we will add additional labels for objects such as ice contamination. If you would like to label this data, please see the labelling tools to get started.
+
+```yaml
+CLASS_LABELS: # autolamella
+  0: "background"
+  1: "lamella"
+  2: "manipulator"
+  3: "landing_post"
+  4: "copper_adaptor"
+  5: "volume_block"
+```
+
+
+#### Downloading Data
+
+To download datasets, you can use the huggingface api:
+
+```python
+
+from datasets import load_dataset
+
+# download waffle dataset
+ds = load_dataset("patrickcleeve/autolamella", name="waffle")
+
+# download liftout dataset
+ds = load_dataset("patrickcleeve/autolamella", name="liftout")
+
+# download serial-liftout dataset
+ds = load_dataset("patrickcleeve/autolamella", name="serial-liftout")
+
+# download test split only
+ds = load_dataset("patrickcleeve/autolamella", name="waffle", split="test")
+
+```
+
+To display images and annotations:
+
+```python
+# show random image image and annotation (training split) 
+import random
+import numpy as np
+import matplotlib.pyplot as plt
+from fibsem.segmentation.utils import decode_segmap_v2
+
+# random data
+idx = random.randint(0, len(ds["train"]))
+image = np.asarray(ds["train"][idx]["image"])
+mask = np.asarray(ds["train"][idx]["annotation"])
+
+# metadata
+split = ds["train"].split
+config_name = ds["train"].config_name
+
+plt.title(f"{config_name}-{split}-{idx:02d}")
+plt.imshow(image, cmap="gray", alpha=0.7)
+plt.imshow(decode_segmap_v2(mask), alpha=0.3)
+plt.axis("off")
+plt.show()
+
+```
+
+| Waffle     | Liftout    |   Serial Liftout   |
+| ----------- | ----------- | ----------- |
+|  ![WaffleData](assets/show_waffle.png)   |   ![LiftoutData](assets/show_liftout.png)   |  ![LiftoutData](assets/show_serial_liftout.png)  |
+
+
+
+### Acknowledgement
+
+- Waffle and Liftout data from Monash
+- Serial Liftout data from MPI
+
+
+### Dataset Format
+
+Image data is 8bit grayscale tiffs, labels are saved as class index maps (8bit tiff). The corresponding image and labels have the same filename, and the labels are located in the labels/ directory.
+
+The file layout of the dataset is:
+
+```bash
+autolamella-dataset/
+    summary.csv
+    autolamella-waffle/
+        train/
+            image-0000.tif
+            labels/
+                image-0000.tif
+        test/
+    autoliftout/
+    serial-liftout/
+```
+
+### Keypoint Dataset
+
+The keypoint dataset is used for model evaluation, to as closely mimic the online performance of the models when running on an actual FIBSEM.
+
+[Under Construction]
+
+## Models
+
+## Baseline Models
+
+We have trained models for each method on the subset of data. Models are available on huggingface patrickcleeve/openfibsem-baseline. Includes archived development models. You can try out a demo of these models on [huggingface](https://huggingface.co/spaces/patrickcleeve/autolamella-demo).
+
+The current best performing models for each method are:
+
+| Method       | Dataset | Checkpoint   |
+| ----------- | ----------- | -----------|
+| AutoLamella Waffle      | Waffle       | autolamella-waffle-20240107.pt    |
+| AutoLamella Liftout | Full        | autolamella-mega-20240107.pt  |
+| AutoLamella Serial Liftout | Serial Liftout         | autolamella-serial-liftout-20240107.pt               |
+
+### Mega Model
+
+We are in the process of developing a combined model for all these methods, called the mega model.
+
+Currently the mega model is outperformed by the specifc waffle, and serial liftout models, but performs better for liftout. My initial thought is that this is because the waffle and serial liftout datasets are from a single sample (and a single run in the serial liftout case), so on these cases the models are probably overfit on these modes. However, the liftout dataset contains multiple samples, and a much more diverse set of images, which benefits more increased size and variation in the dataset.
+
+Therefore, we are very keen to incorporate a lot more varied training data for the other methods. We are looking for additional data to include in this model, if you would like to contribute, please get in contact.
+
+## Tools
+
+This section highlights some of the tools developed.
+
+For specific details about concepts and terminology, please refer to the [Concepts Page](../openfibsem/concepts.md).
+
+### Data Collection and Curation (Supervised Mode)
+
+When running autolamella, imaging data is logged and saved for later use. When running in supervised mode, every correction the user makes is logged and classified. We use this user feedback to automatically split the dataset, and generate a set of evaluation keypoints for testing. This form of active learning allows efficient model improvement by only including data that the model fails to predict correctly.
+
+### Data Labelling (Model Assisted Labelling)
+
+We developed a napari plugin for labelling images for semantic segmentation. The plugins supports manual labelling and model assisted labelling.
+
+Model assisted labelling supports two kinds of model; segmentation models and segment anything models. Segmentation models allow prelabelling of data using a pretrained segmentation model, while segment anything models use a generic segment anything model to prompt segmentation. The user can then manually edit the segmentation to correct for any model errors.
+
+In practice these two models are useful at different stages of development. For example, when labelling a new dataset, or one where the model performance is poor, using segment anything allows you to quickly generate segmentation masks with interactive prompting. Once you have a decent model, segmentation model labelling is much faster, and usually only requires minor edits from the user.
+
+#### Example: Serial Liftout Data
+
+When we received the serial liftout data, we didnt have any models trained for it. Therefore, it was much faster to use SAM to label the training dataset. Once an initial model was trained we switched to using the segmentation model, and manually editing the masks.
+
+![Segment Anything Labelling](assets/sam_assisted_labelling.gif)
+Segment Anything Labelling
+
+![Model Assisted Labelling](assets/model_assisted_labelling.gif)
+Model Assisted Labelling
+
+With the assistance of this tooling, the full dataset (all methods) now contains around 1600 labelled images.
+
+### Model Evaluation
+
+We provide evaluation tools for evaluating the perform of a number of different models on the keypoint detection task.
+
+The evaluation will run each model checkpoint through the detection pipeline, save the results and compare them to the ground truth labels provided. Each individual image can be plotted, as well as the full evaluation statistics. This evaluation pipeline is useful for checking model improvement and preventing regressions on previously successful tasks.
+
+Here we are showing evaluations from the development models.
+
+[TODO]
+
+## Experimental
+
+### Generating Labels
+
+From the segmentation masks, you can generate other useful labelled objects. The following script will generate bounding boxes and instance segmentations for each object in the mask.
+
+Generate bounding boxes and instance masks from segmentation masks.
+generate_segmentation_objects.py
+
+These should be easily converted to coco-format for training.
+
+[Under Construction]
diff --git a/docs/openfibsem/concepts.md b/docs/openfibsem/concepts.md
@@ -273,7 +273,7 @@ In order to efficiently improve the model, as well as automatically generate tes
 
 This form of active learning allows us to collect images that the model is currently failing on, and not feed it any more of images that it already succeeds at. This is crucical to balance the dataset, and efficiently improve when the dataset is relative small (~100s of images).
 
-For more details on this approach to data curation, please see [AutoLamella Datasets and Models](../autolamella/case_study_dataset_and_models.md)
+For more details on this approach to data curation, please see [AutoLamella Datasets and Models](../autolamella/ml.md)
 
 ### Data Labelling
 
@@ -282,7 +282,7 @@ We developed a napari plugin for labelling images for semantic segmentation. The
 [Image Labelling Napari Plugin]
 [Example Labelling GIF for each mode]
 
-For details about how this was used, please see [AutoLamella Datasets and Models](../autolamella/case_study_dataset_and_models.md)
+For details about how this was used, please see [AutoLamella Datasets and Models](../autolamella/ml.md)
 
 ### Manual Labelling
 
@@ -298,37 +298,14 @@ To use, go to the Model tab and load your model, and then tick 'model assisted'
 
 ### SegmentAnything Assisted Labelling
 
-We have implemented the Segment Anything Model from MetaAI. This model is trained to segment any object. Here we use it as part of the model assisted labelling. We currently support two versions of SAM; SegmentAnything and MobileSAM.
+We have implemented the Segment Anything Model from MetaAI. This model is trained to segment any object. Here we use it as part of the model assisted labelling. We currently support the huggingface transformers implementation for SAM. You can use any compatible SAM model.
 
-SegmentAnything is the original model from MetaAI. It is powerful, but requires a decently large GPU.
-MobileSAM is a recent, faster implementation of SAM, that can be run without a large GPU.
+The recommended models are:
 
-To use either model, you will need to install some additional dependencies, and download the model weights as described below.
+- Large GPU: facebook/sam-vit-base or facebook/sam-vit-large
+- Small GPU / CPU: Zigeng/SlimSAM-uniform-50
 
-##### Segment Anything
-
-For more detailed about SAM see: <https://github.com/facebookresearch/segment-anything>
-
-To use SAM:
-
-  ```python
-pip install git+https://github.com/facebookresearch/segment-anything.git
-pip install opencv-python pycocotools matplotlib onnxruntime onnx
-
-```
-
-Download weights: [SAM ViT-H](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)
-
-##### MobileSAM
-
-The labelling UI also supports using MobileSAM which is a faster version of Segment Anything (+ less gpu memory).
-
-``` bash
-pip install git+https://github.com/ChaoningZhang/MobileSAM.git
-
-```
-
-Download weights: [MobileSAM ViT-T](https://drive.google.com/file/d/1dE-YAG-1mFCBmao2rHDp0n-PP4eH7SjE/view?usp=sharing)
+Instructions for using SAM assisted labelling are shown in the side panel.
 
 ### Model Training
 

diff --git a/docs/openfibsem/roadmap.md b/docs/openfibsem/roadmap.md
@@ -0,0 +1,5 @@
+# Roadmap
+
+## Version 1.0
+
+[Under Construction]
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -61,13 +61,14 @@ nav:
     - Concepts: openfibsem/concepts.md
     - User Guide: openfibsem/user_guide.md
     - Examples: openfibsem/examples.md
+    - Roadmap: openfibsem/roadmap.md
     - API Reference: openfibsem/reference.md
   - AutoLamella:
     - autolamella/index.md
     - Getting Started: autolamella/getting_started.md
     - Motivation: autolamella/motivation.md
     - User Guide: autolamella/user_guide.md
-    - Case Study - Machine Learning: autolamella/case_study_datasets_and_models.md
+    - Machine Learning: autolamella/ml.md
     - Case Study - Serial Liftout: autolamella/case_study_serial_liftout.md
   - Blog:
     - blog/index.md