Annotations utils #18

sfmig · 2024-12-10T18:58:49Z

Read VIA/COCO json annotation files
- as xarray dataset? as pandas dataframe?
  - going for pandas in the end for ease of combining, groupby etc. but I have a notebook exploring xarray option in the PR branch.
- add validators (PR Validate VIA and COCO files for untracked data #20)
- add loaders (PR Read annotations in VIA or COCO format as dataframe #21)
- add tests
Save as VIA/COCO json annotation file
Utils
- Combine annotation files
- Filter out bboxes based on min/max area
- Check for duplicate bboxes and (optionally) correct
  - see movement validations for bboxes dataset
- Print and/or plot summary statistics for manual annotations
  - use pandas describe?
- Filter out bboxes if boundaries are outside image?
- Filter out bboxes if inside a certain region (use shapely)
- Convert between VIA and COCO json
  - via "standard" dataframe

Probably a good idea to deal with each outer bulletpoint as a separate PR

sfmig · 2024-12-17T18:01:25Z

These scripts shared in the VIA site may be useful to check.

sfmig · 2025-01-31T15:44:22Z

From PR #31

Mock script

A mock script of the annotations module

import pandas as pd
from shapely import Point, Polygon

from ethology.annotations.curation import (
    remove_inside_polygon,
    remove_outside_image,
)
from ethology.annotations.io import (
    df_bboxes_from_file,
    df_bboxes_to_COCO_file,
    df_bboxes_to_VIA_file,
    df_keypoints_from_file,
)
from ethology.annotations.transforms import (
    compute_bboxes_from_keypoints,
    compute_centroid_from_keypoints,
    compute_masks_from_keypoints,
)

############################################
## Read data from two files, option 1
df_bboxes_1 = df_bboxes_from_file("path/to/annotations_1.json")
df_bboxes_2 = df_bboxes_from_file("path/to/annotations_2.json")

# combine dataframes
df_bboxes = pd.concat([df_bboxes_1, df_bboxes_2])
# and remove duplicates
df_bboxes = df_bboxes.drop_duplicates()

############################################
## Read data from two files, option 2
# or: we combine and remove duplicates in one step
df_bboxes = df_bboxes_from_file(
    ["path/to/annotations_1.json", "path/to/annotations_2.json"]
)

############################################
# Filter out boxes whose boundaries are outside image
image_size = (1000, 1000)
df_bboxes = df_bboxes[
    df_bboxes["x"] + 0.5 * df_bboxes["width"] > image_size[0]
]
df_bboxes = df_bboxes[
    df_bboxes["y"] + 0.5 * df_bboxes["height"] > image_size[1]
]
df_bboxes = df_bboxes[df_bboxes["x"] - 0.5 * df_bboxes["width"] < 0]
df_bboxes = df_bboxes[df_bboxes["y"] - 0.5 * df_bboxes["height"] < 0]

# or:
df_bboxes = remove_outside_image(df_bboxes, image_size)

############################################
# Filter out boxes that are within a specified polygon using shapely
polygon = Polygon([(0, 0), (0, 100), (100, 100), (100, 0)])
df_bboxes = df_bboxes[
    df_bboxes.apply(
        lambda x: not polygon.contains(Point(x["x"], x["y"])), axis=1
    )
]
# or
df_bboxes = remove_inside_polygon(
    df_bboxes, polygon
)  # with default columns to check in the dataframe: 'x', 'y'

############################################
# Print summary statistics
print(df_bboxes.describe())

############################################
# Export data

#  as a VIA json file
df_bboxes_to_VIA_file(df_bboxes, "path/to/output_VIA_file.json")

# as a COCO json file
df_bboxes_to_COCO_file(df_bboxes, "path/to/output_COCO_file.json")

# as a csv file with specified header
# df_bboxes.to_csv("path/to/output_csv_file.csv", header=["x", "y", "width", "height"])

############################################
# Transforms module
# other nice-to-have-in-the-future

# read SLEAP annotated data
df_keypoints = df_keypoints_from_file(
    "path/to/annotations_keypoints.slp",
    source_software="SLEAP",
)  # reads data into df with standard headings

# compute bounding boxes from keypoints
df_bboxes = compute_bboxes_from_keypoints(df_keypoints)

# compute RLE masks from keypoints
# under the hood prompt SAM2?
df_masks = compute_masks_from_keypoints(df_keypoints)
# plus additional arguments to determine the buffer etc
# or maybe I prompt SAM2?

# compute centroid from keypoints
df_centroid_kpts = compute_centroid_from_keypoints(df_keypoints)


# read idtracker data
df_keypoints = df_keypoints_from_file(
    "path/to/annotations_idtracker.csv",  # or .npy or .json
    source_software="idtracker",
)  # reads data into df with standard headings

# compute bounding boxes from keypoints
df_bboxes = compute_bboxes_from_keypoints(df_keypoints)

sfmig mentioned this issue Dec 10, 2024

Remove small crabs from detector groundtruth SainsburyWellcomeCentre/crabs-exploration#259

Open

This was referenced Jan 20, 2025

Add a validator for JSON files #26

Closed

Add validators for bboxes annotation files #32

Merged

sfmig mentioned this issue Jan 31, 2025

Read bounding boxes data as a dataframe #31

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotations utils #18

Annotations utils #18

sfmig commented Dec 10, 2024 •

edited

Loading

sfmig commented Dec 17, 2024

sfmig commented Jan 31, 2025

Annotations utils #18

Annotations utils #18

Comments

sfmig commented Dec 10, 2024 • edited Loading

sfmig commented Dec 17, 2024

sfmig commented Jan 31, 2025

Mock script

sfmig commented Dec 10, 2024 •

edited

Loading