A toolbox for markerless tracking of body parts of animals in lab settings performing various tasks, like trail tracking, reaching in mice and various Drosophila behaviors during egg-laying (see Mathis et al. for details). There is, however, nothing specific that makes the toolbox only applicable to these tasks and/or species. The toolbox has also already been successfully applied to rats, humans, various fish species, robots, and race horses performing various tasks.
Please see www.mousemotorlab.org/deeplabcut for video demonstrations of automated tracking.
This work utilizes the feature detectors (ResNet + readout layers) of one of the state-of-the-art algorithms for human pose estimation by Insafutdinov et al., called DeeperCut, which inspired the name for our toolbox (see references below).
In our preprint we demonstrate that those feature detectors can be trained with few labeled images to achieve excellent tracking accuracy for various body parts in lab tasks. Please check it out:
"Markerless tracking of user-defined features with deep learning" by Alexander Mathis, Pranav Mamidanna, Taiga Abe, Kevin M. Cury, Venkatesh N. Murthy, Mackenzie W. Mathis* and Matthias Bethge*
- We added wiki pages to include "quick guides" for training and for the evaluation tools that we provide with the package. We still recommend becoming familiar with the code base via the demo (below) first.
- We also have a Slack group if you have questions that you feel don't fit a github issue (deeplabcut.slack.com) (please email Mackenzie at [email protected] to join!)
A typical use case is:
A user has videos of an animal (or animals) performing a behavior and wants to extract the position of various body parts from images/video frames. Ideally these parts are visible to a human annotator, yet potentially difficult to extract by standard image processing methods due to changes in background, etc.
To solve this problem, one can train feature detectors in an end-to-end fashion. In order to do so one should:
- label points of interests (e.g. joints, snout, etc.) from distinct frames (containing different poses, individuals etc.)
- trains a deep neural network while leaving out labeled frames to check if it generalizes well
- once the network is trained it can be used to analyze videos in a fast way
The key result of our paper is that one typically requires just a few labeled frames to get excellent tracking results.
The general pipeline for first time use is:
Install --> Extract frames --> Label training data --> Train DeeperCut feature detectors --> Apply your trained network to unlabeled data --> Extract trajectories for analysis.
-
Hardware:
- Computer: For reference, we use Ubuntu 16.04 LTS and run a docker container that has TensorFlow, etc. installed (*available in a future release). One should also be able to run the code in Windows or MacOS (some users have already successfully done so). You will need a strong GPU with at least 8GB memory such as the NVIDIA GeForce 1080 Ti. There are no other hardware requirements. In particular, the software is very robust to track data from pretty much any camera (grayscale, color, or graysale captured under infrared light etc.).
-
Software:
- The toolbox is written in Python 3. You will need TensorFlow (we used 1.0 for figures in papers, later versions also work with the provided code (we tested TensorFlow versions 1.0 to 1.4, but recommend 1.0, ses below) for Python 3 with GPU support (otherwise training and running is pretty slow). Please check your CUDA and TensorFlow installation with this line (below), and you can test that your GPU is being properly engaged with these additional tips.
$ sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
Please install:
- Install Sypder (or equivalent IDE) and/or Jupyter Notebook
- Clone (or download) the code we provide
- You will also need to install the following Python packages (in the terminal type):
$ pip install scipy scikit-image matplotlib pyyaml easydict
$ pip install moviepy imageio tqdm tables sk-video
$ git clone https://github.com/AlexEMG/DeepLabCut.git
Anaconda is perhaps the easiest way to install Python and additional packages across various operating systems.
-
If you want to run the code on our demo video, a mouse reaching video from Mathis et al., 2017, you will NOT run code from sections (0), (1), or (2) below, as we have created labels for this video already (and e.g. (0) will extract different frames that are thus not labeled).
-
We recommend looking at the first notebooks, then proceed to (3) Formatting the data below. Also note that this demo data contains so few labeled frames that one should not train the network (other then for brief testing) on the corresponding data set and expect it to work properly - it is only for demo purposes.
- The following steps document using the code with either Python scripts or in Jupyter Notebooks:
(0) Configuration of your project: Open the "myconfig.py" file and set the global variables for your dataset. (Demo users, don't edit this if you want to test on the supplied video)
(1) Selecting data to label: In the folder "Generating_a_Training_Set", the provided code allows you to select a subset of frames in a video(s) for labeling. Make sure videos you want to use for the training set are in a sub-folder under "Generating_a_Training_Set" or change the video path accordingly in "myconfig.py".
-
IDE users:
- Open "Step1_SelectRandomFrames_fromVideos.py" and crop videos if behavior of interest only happens in subset of frame (see Step1_SelectRandomFrames_fromVideos.py for detailed instructions; edit in Spyder or your favorite integrated development environment (IDE) an run the script).
-
Juypter Users: use the Step1_.._demo.ipynb file* - In general, the supplied Jupyter Notebook is helpful to optimize the video cropping step.
Generally speaking, one should create a training set that reflects the diversity of the behavior with respect to postures, animal identities, etc. of the data that will be analyzed. This code randomly selects frames from the videos in a temporally uniformly distributed way. This is fine when the postures vary accordingly. However, the behavior might be sparse (as in the case of reaching, where the reach and pull is very fast and the mouse is not moving much between trials). However, one can extract various example videos of different pulls, then this code will sample the behavior well. One should take this into account when selecting frames to label (i.e. because you can label so little data, be sure your selected frames capture the full breadth of the behavior. You may want to additionally hand select extra frames of interest).
(2) Label the frames:
-
You should label a sufficient number of frames with the anatomical locations of your choice. For the behaviors we have tested so far, 100-200 frames gave good results (see preprint). Depending on your required accuracy and the nature of the scene statistics more training data might be necessary. Try to label consistently similar spots (e.g. on wrist that is very large, try to label the same location).
-
Labeling can be done in any program, but we recommend using Fiji. In Fiji one can simply open the images, create a (virtual) stack* (in brief, in fiji: File > Import > Image Sequence > (check "virtual stack")), then use the "Multi-point Tool" to label frames. You scroll through the frames and click on as many points as you wish in the same order on each frame. Then simply measure and save the resulting .csv file (Analyze>Measure (or simple Ctrl+M)).
-
You can either store one .csv file per folder that contains all body parts in a cyclical way (same, repeating order). If a particular body part is not visible in a frame, then click close to (0,0) to later exclude those labels (see description in
myconfig.py
for details). In this case, setmultibodypartsfile=True
and put the name of the corresponding csv file undermultibodypartsfilename
inmyconfig.py
. Furthermore, make sure that the sequence of body parts has exactly the same order as the cyclically labeled body parts. Alternatively, you can put one csv file per body part and store them with the names defined inbodypart
list ofmyconfig.py
. In this case setmultibodypartsfile=False
.
*To open virtual stack see: https://imagej.nih.gov/ij/plugins/virtual-opener.html The virtual stack is helpful when the images have different sizes. This way they are not rescaled and the label information does not need to be rescaled.
(3) Formatting the data I:
- IDE users: The code "Step2_ConvertingLabels2DataFrame.py" creates a data structure in pandas (stored as .h5 and .csv) combining the various labels together with the (local) file path of the images. This data structure also keeps track of who labeled the data and allows to combine data from multiple labelers.
Keep in mind that ".csv" files for each bodypart or multiple bodyparts listed in the myconfig.py file should exist in the folder alongside the individual images.
- Juypter Users: use the Step2_.._demo.ipynb file
(4) Checking the formated data:
After this step, you may check if the data was loaded correctly and all the labels are properly placed (Use "Step3_CheckLabels.py").
- Juypter Users: use the Step3_.._demo.ipynb file
(5) Formatting the data II: Next split the labeled data into test and train sets for benchmarking ("Step4_GenerateTrainingFileFromLabelledData.py"). This step will create a ".mat" file, which is used by DeeperCut as well as a ".yaml" file containing meta information with regard to the parameters of the DeeperCut. Before this step consider changing the parameters in 'pose_cfg.yaml'. This file also contains short descriptions of what these parameters mean. Generally speaking pos_dist_thresh and global_scale will be of most importance. Then run the code. This file will create a folder with the training data as well as a folder for training the corresponding model in DeeperCut.
-
Juypter Users: use the Step4_.._demo.ipynb file
-
The output will be two folders for train and test data (with their respective yaml files)
(6) Training the deep neural network:
The folder pose-tensorflow contains an earlier, minimal yet sufficient for our purposes variant of DeeperCut, which we tested for TensorFlow 1.0 (which we recommend). All features other than simultanous evaluation for multiple snapshots (Step1_EvaluateModelonDataset.py
) work for TensorFlow versions up to 1.4. This script also works for single snapshots, see Issue 8 for details. Before training a model for the first time you need to download the weights for the ResNet pretrained on ImageNet from tensorflow.org (~200MB). To do that:
$ cd pose-tensorflow/models/pretrained
$ ./download.sh
Next copy the two folders generated in step (5) Formatting the data II into the models folder of pose-tensorflow (i.e. pose-tensorflow/models/). We have already done this for the demo, which you will find there. To transfer:
$ cp -R YOURexperimentNameTheDate-trainset95shuffle1 ../pose-tensorflow/models/
$ cp -R UnaugmentedDataSet_YOURexperimentNameTheDate/ ../pose-tensorflow/models/
Then (in a terminal) navigate to the subfolder "train" of the machine file, i.e. in our case
$ cd pose-tensorflow/models/reachingJan30-trainset95shuffle1/train
and then start training (good luck!)
$ TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
If your machine has multiple GPUs, you can select which GPU you want to run on by setting the environment variable, eg. CUDA_VISIBLE_DEVICES=0.
Tips: You can also stop during a training (Cntrl-C), and restart from a snapshot (aka checkpoint): Just change the init_weights term, i.e. instead of "init_weights: ../../pretrained/resnet_v1_50.ckpt" put "init_weights: ./snapshot-insertthe#ofstepshere" (i.e. 10,000). Train for several thousands of iterations until the loss plateaus.
(7) Evaluate your network:
In the folder "Evaluation-tools", you will find code to evaluate the performance of the trained network on the whole data set (train and test images).
$ CUDA_VISIBLE_DEVICES=0 python3 Step1_EvaluateModelonDataset.py #to evaluate your model [needs TensorFlow]
$ python3 Step2_AnalysisofResults.py #to compute test & train errors for your trained model
(8) Run the trained network on videos and analyze results
After successfully training and finding low generalization error for the network, you can extract labeled points and poses from all videos and plot them above frames. Of course one can use the extracted poses in many other ways.
-
To begin, first edit the myconfig_analysis.py file
-
For extracting posture from a folder with videos run:
$ CUDA_VISIBLE_DEVICES=0 python3 AnalyzeVideos.py
The postures per frame created inside a MultiIndex Panda Arrays, which contains network, bodypart information as well as the coordinates. These arrays are stored in the efficient Hierarchical Data Format. The data can also be exported in e.g. ".csv" format or many other formats (see DeepLabCut#17).
- Then you can make labeled videos ("MakingLabeledVideo.py"). This script creates the video by storing each frame individually and then combining it into a video. This code is slow, but useful for creating flexible high-quality videos (as it has full matplotlib functionality) and looking at individual frames. There is also a "MakingLabeledVideo_fast.py" script that is much faster, but less flexible. These scripts also illustrate how to load and work with the pose-estimation data.
$ python3 MakingLabeledVideo_fast.py
- Issue Tracker: https://github.com/AlexEMG/DeepLabCut/issues
- Source Code: https://github.com/AlexEMG/DeepLabCut
If you are having issues, please let us know (Issue Tracker). For questions feel free to reach out to: [[email protected]] or [[email protected]] Join our Slack user group: (deeplabcut.slack.com) (please email Mackenzie to join!).
Alexander Mathis, Mackenzie Mathis,and the DeeperCut authors for the feature detector code. Edits and suggestions by Jonas Rauber, Taiga Abe, Hao Wu, Jonny Saunders and Brandon Forys. The feature detector code is based on Eldar Insafutdinov's TensorFlow implementation of DeeperCut. Please check out the following references for details:
@inproceedings{insafutdinov2017cvpr,
title = {ArtTrack: Articulated Multi-person Tracking in the Wild},
booktitle = {CVPR'17},
url = {http://arxiv.org/abs/1612.01465},
author = {Eldar Insafutdinov and Mykhaylo Andriluka and Leonid Pishchulin and Siyu Tang and Evgeny Levinkov and Bjoern Andres and Bernt Schiele}
}
@article{insafutdinov2016eccv,
title = {DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model},
booktitle = {ECCV'16},
url = {http://arxiv.org/abs/1605.03170},
author = {Eldar Insafutdinov and Leonid Pishchulin and Bjoern Andres and Mykhaylo Andriluka and Bernt Schiele}
}
@misc{1804.03142,
Author = {Alexander Mathis and Pranav Mamidanna and Taiga Abe and Kevin M. Cury and Venkatesh N. Murthy and Mackenzie W. Mathis and Matthias Bethge},
Title = {Markerless tracking of user-defined features with deep learning},
Year = {2018},
Eprint = {arXiv:1804.03142},
}
This project is licensed under the GNU Lesser General Public License v3.0.