This repository includes resources for The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting (ECCV 2022):
- Links to download the dataset and annotations
- Evaluation code to reproduce our results and evaluate new algorithms
- A script to convert raw sonar frames into the enhanced format used by the Baseline++ method in the paper
Data can be downloaded from CaltechDATA using the following links.
Training, validation, and testing images [123 GB]
-
Running
md5sum
on the tar.gz file should produce:176648e618fc5013db972aa7ded01517 fish_counting_frames.tar.gz
-
We also make available a Tiny Dataset [1.44 GB] which is a subset of the larger dataset, for exploring the dataset without downloading the entire thing. It includes data from all 6 river locations/subsets (elwha, kenai-channel, kenai-rightbank, kenai-train, kenai-val, nushagak). It has 20 clips (videos) from each river and up to 50 consecutive frames from each clip. The formatting, etc. of each of the subdirectories and file structure follow the formats listed below. Running
md5sum
on the tar.gz file should produce6ae6062d50a90d4b084ebe476a392c0e tiny_dataset.tar.gz
- Running
md5sum
on the tar.gz file should produce:152286bd6f25f965aadf41e8a0c44140 fish_counting_metadata.tar.gz
- Running
md5sum
on the tar.gz file should produce:34c6bbd5e9187f05bfc0be72df002b19 fish_counting_annotations.tar.gz
Coco-style Annotations [11 MB]
- Running
md5sum
on the tar.gz file should produce:92694f9df235cf1e089f65e06faa9c82 coco_formatted_annotations.tar.gz
Frames are provided as single-channel JPGs. After extracting the tar.gz
, frames are organized as follows. Each location described in the paper -- Kenai Left Bank (training & validation), Kenai Right Bank, Kenai Channel, Nushagak, and Elwha -- has its own directory, with subdirectories for each video clip at that location.
frames/
raw/
kenai-train/
One directory per video sequence in the training set.
Images are named by frame index in the video clip, e.g. 0.jpg, 1.jpg ...
kenai-val/
One directory per video sequence in the validation set.
0.jpg, 1.jpg ...
kenai-rightbank/
...
kenai-channel/
...
nushagak/
...
elwha/
...
The 3-channel frames used by our Baseline++ method can be generated using convert.py
:
python convert.py --in_dir PATH/TO/frames/raw --out_dir PATH/TO/OUTPUT/DIRECTORY
The directory structure will be maintained.
Clip-level metadata is provided for each video clip in the dataset. One JSON file is provided for each location: kenai-train.json
, kenai-val.json
, kenai-rightbank.json
, kenai-channel.json
, nushagak.json
, and elwha.json
. Each JSON file contains a list of dictionaries, one per video clip. Each entry contains the following metadata:
{ // Information for a single clip
"clip_name" : // Unique ID for this clip; matches the name of the directory containing image frames
"num_frames": // Number of frames in the video clip
"upstream_direction" : // Either `left` or `right`
"width": // Image width in pixels
"height": // Image width in pixels
"framerate": // Video frame rate, in frames per second
"x_meter_start": // Meter distance from the sonar camera at x = 0
"x_meter_stop": // Meter distance from the sonar camera at x = width-1
"y_meter_start": // Meter distance from the sonar camera at y = 0
"y_meter_stop": // Meter distance from the sonar camera at y = height-1
}
We provide annotations in COCO format as well as MOTChallenge format. For both, we use the default directory structure as described here. After extracting the tar.gz
, the directory structure is as follows:
annotations/
kenai-train/
One directory per video sequence in the training set.
gt.txt
kenai-val/
One directory per video sequence in the validation set.
gt.txt
kenai-rightbank/
...
kenai-channel/
...
nushagak/
...
elwha/
...
Following the MOTChallenge format, each gt.txt
file contains one entry per track per frame. Each line contains 10 values:
<frame_number>, <track_id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>
The world coordinates x,y,z
are ignored for 2D data and are filled with -1. For ground truth tracks, conf=-1
as well. All frame numbers, target IDs and bounding boxes are 1-indexed (i.e. the minimum bb_left
and bb_top
values are 1, not 0). Here is an example:
1, 3, 794.27, 247.59, 71.245, 174.88, -1, -1, -1, -1
1, 6, 1648.1, 119.61, 66.504, 163.24, -1, -1, -1, -1
1, 8, 875.49, 399.98, 95.303, 233.93, -1, -1, -1, -1
coco_formatted_annotations/
kenai-train/
coco.json
kenai-val/
coco.json
kenai-rightbank/
...
kenai-channel/
...
nushagak/
...
elwha/
...
Where each coco.json
file includes all of the regular information regarding image labels and annotation labels, with the addition that each image_label also has a dir_name
corresponding to the directory name of the video sequence in the training set.
{
"images": [
{
"dir_name": "2018-06-01-JD152_LeftFar_Stratum2_Set1_LO_2018-06-01_191003_0_280",
"file_name": "0.jpg",
"height": 1915,
"width": 781,
"id": 0
...
"categories": [
{
"supercategory": "",
"id": 1,
"name": "fish"
}
],
"annotations": [
{
"id": 1,
"image_id": 17,
"bbox": [
241.0,
1851.0,
15.0,
10.0
],
"area": 150,
"iscrowd": 0,
"category_id": 1
},
...
}
We provide output from our Baseline and Baseline++ methods in MOTChallenge format as well.
ECCV22 Baseline Results [18 MB]
- Running
md5sum
on the tar.gz file should produce:ef8d517ad45419edce7af2e7dc5016be fish_counting_results.tar.gz
Note that the directory structure for predictions is different from the ground truth annotations. After extracting the tar.gz
, the directory structure is as follows:
results/
kenai-val/
baseline/
data/
One text file per clip, named {clip_name}.txt
baseline++/
data/
One text file per clip, named {clip_name}.txt
kenai-rightbank/
...
kenai-channel/
...
nushagak/
...
elwha/
...
Clone the repo with submodules to enable MOT evaluation:
git clone --recursive https://github.com/visipedia/caltech-fish-counting.git
// or
git clone --recursive [email protected]:visipedia/caltech-fish-counting.git
If you already cloned, submodules can be retroactively intialized with:
git submodule init
git submodule update
We provide evaluation code using the TrackEval codebase. In addition to the CLEAR, ID, and HOTA tracking metrics, we extend the TrackEval codebase with a custom metric nMAE
as described in the paper:
where is the number of video clips, is the absolute counting error for each clip:
and is the ground truth count for clip :
Run the evaluation script from the command line to reproduce the results from the paper:
python evaluate.py --results_dir PATH/TO/results --anno_dir PATH/TO/annotations --metadata_dir PATH/TO/metadata --tracker baseline
python evaluate.py --results_dir PATH/TO/results --anno_dir PATH/TO/annotations --metadata_dir PATH/TO/metadata --tracker baseline++
Justin Kay, Peter Kulits, Suzanne Stathatos, Siqi Deng, Erik Young, Sara Beery, Grant Van Horn, and Pietro Perona
We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization for multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.
If you find our work useful in your research please consider citing our paper:
@inproceedings{cfc2022eccv,
author = {Kay, Justin and Kulits, Peter and Stathatos, Suzanne and Deng, Siqi and Young, Erik and Beery, Sara and Van Horn, Grant and Perona, Pietro},
title = {The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}