Skip to content

Commit

Permalink
Include readme.md in sphinx documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Kalior committed Aug 30, 2018
1 parent 694fb8c commit cf9ec09
Show file tree
Hide file tree
Showing 16 changed files with 69 additions and 49 deletions.
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Action recognition based on [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose/) and TDA (using [Gudhi](http://gudhi.gforge.inria.fr/python/latest/) and [sklearn_tda](https://github.com/MathieuCarriere/sklearn_tda))

## Performance
The ensemble classifier achieves an accuracy of `0.912`, on custom data. However, there is still a need to capture more data to see how well it would generalise over different actors and scenes. The TDA classifier on its own achieves an accuracy of `0.823` on the same data.

The ensemble classifier achieves an accuracy of `0.912` on custom data. However, there is still a need to capture more data to see how well it would generalise over different actors and scenes. The TDA classifier on its own achieves an accuracy of `0.823` on the same data.

## From video to action detection

The pipeline can be run in its entirety using the following scripts (also in [`all.sh`](all.sh), and see `--help` for each script for options and parameters). The first step is to generate data and train a classifier:

```bash
Expand All @@ -19,31 +21,35 @@ The last step creates a trained classifier (in a `.pkl` file). This classifier
python3.6 live_prediction.py --classifier classifier.pkl --video test-0.mp4
```

which will output the identified actions, a video with the predictions overlayed on the original video, and a video per predicted action.
The script will output identified actions, a video with the predictions overlayed on the original video, and a video per predicted action.

Most of what the scripts do is to wrap input and output to the different modules in [`action_recognition`](action_recognition/). The final script (`live_prediction.py`) being the exception, as it also aggregates some of the predictions, and decides on which parts of each track should be used to predict actions. Below is a description of what each of these scripts do:

#### generate_tracks.py

* The [`generate_tracks.py`](generate_tracks.py) script creates tracks of people identified in a video for dataset creation.
* Creates a [`tracker.Tracker`](action_recognition/tracker/tracker.py) object with either [`detector.CaffeOpenpose`](action_recognition/detector/caffe_openpose.py) (which is CMU's original implementation) or using [`detector.TFOpenpose`](action_recognition/detector/tf_openpose.py) (which is faster, but did not deliver the same level of accuracy for me). Also, requires an output directory to where it places the processed videos and tracks.
* Produces two files: A video file with the identified keypoints overlayed on the original video. A file called `{path_to_video}-tracks.npz`, which contains two numpy arrays: `tracks` (i.e. the keypoints of each identified person in the video), and `frames` (i.e. the corresponding frame numbers for each identified person, primarily useful for later visualisation of the keypoints).
* Each track is a `[n_frames, n_keypoints, 3]` `numpy.ndarray` which is predicted as being a single person through several frames, making the final outputted array of shape `[n_tracks, n_frames, n_keypoints, 3]`, where the values are (x, y, confidence). The frames array, correspondingly, has the shape `[n_tracks, n_frames, 1]`. Note, however, that it will be an ndarray with `dtype=object`, since the n_frames per track will differ.
* Each track is a `[n_frames, n_keypoints, 3]` `numpy.ndarray` which is predicted as being a single person through several frames, making the final outputted array of shape `[n_tracks, n_frames, n_keypoints, 3]`, where the values are (x, y, confidence). The frames array, correspondingly, has the shape `[n_tracks, n_frames, 1]`. Note, however, that both a arrays will be ndarrays with `dtype=object`, since `n_frames` per track will differ.

#### create_dataset.py

* Run the [`create_dataset.py`](create_dataset.py) script to create a dataset from tracks. If you have not previously labelled the data, the labelling process will either give you the option to look through the videos and discard bad chunks (if there are timestamps for the videos with corresponding labels) or manually label the data by displaying each chunk and requiring input on which label to attach to which chunk.
* The script outputs `{name}-train.npz` and `{name}-test.npz` files containing the corresponding `chunks`, `frames`, `labels`, and `videos` of the train and test sets. Note that the `frames` and `videos` are only used for visualisation of the data.
* The labelling process only needs to be done once, after which a `.json` file is created per tracks file, which can be manually edited and will be parsed for labels subsequent times.
* During the creation of the dataset, before dividing tracks into chunks, there are also a couple of post-processing steps:
1. Merge tracks that are very close to each other at their ends, or throughout a longer period.
2. Remove tracks that are too short, under 15 frames long.
3. Fill-in missing keypoints (as OpenPose sometimes does not output every keypoint) by maintaining the same distance to a connected keypoint (e.g. wirst-elbow) as when the keypoint was last seen. This increases the accuracy of the classifier later on.
4. Fill-in missing frames by interpolating positions of every keypoint. This makes sure that chunks don't incorrectly have more movement just because OpenPose lost track of that person for a couple of frames.
1. Merge tracks that are very close to each other at their ends, or throughout a longer period.
2. Remove tracks that are too short, under 15 frames long.
3. Fill-in missing keypoints (as OpenPose sometimes does not output every keypoint) by maintaining the same distance to a connected keypoint (e.g. wirst-elbow) as when the keypoint was last seen. This increases the accuracy of the classifier later on.
4. Fill-in missing frames by interpolating positions of every keypoint. This makes sure that chunks don't incorrectly have more movement just because OpenPose lost track of that person for a couple of frames.
* If there are multiple datasets that you wish to combine, you can run the [`combine_datasets.py`](combine_datasets.py) script which allows you to do exactly that.

#### visualise_dataset.py
* If you wish, you can now run [`visualise_dataset.py`](visualise_dataset.py), with any of the options to get an idea about, for instance, how the point clouds look or how well the features of the feature engineering seems to separate the different classes.

* If you wish, you can now run [`visualise_dataset.py`](visualise_dataset.py) to get an idea about, for instance, how the point clouds look or how well the features of the feature engineering seems to separate the different classes.

#### train_classifier.py

* The [`train_classifier.py`](train_classifier.py) script finally train a classifier on the data. It accepts a dataset as input (without the `-test` and `-train` suffix) and an option to run either [`--feature-engineering`](action_recognition/classifiers/feature_engineering_classifier.py), [`--tda`](action_recognition/classifiers/tda_classifier.py), or [`--ensemble`](action_recognition/classifiers/ensemble_classifier.py). They will produce confusion matrices of the classifier on the test set. The `--feature-engineering` option trains a classifier on hand-selected features. The `tda` option runs a SlicedWasserstein Kernel on the Persistence diagrams of the generated point clouds from the data. The `ensemble` option combines the Sliced Wasserstein kernel with the feature engineering using a voting classifier.
* The pipeline for the TDA calculation has 7 steps, remember the data is split up into chunks by [`create_dataset.py`](create_dataset.py):
1. Extract certain keypoints (the neck, ankles, and wrists have worked the best for me), which both speeds up the computation and increases accuracy.
Expand All @@ -56,25 +62,28 @@ Most of what the scripts do is to wrap input and output to the different modules
* The training can take a couple of minutes, naturally longer for the TDA calculations than for the pure feature engineering. The SlicedWasserstein kernel is the computation that takes the longest (but thankfully prints its progress), roughly 1.6 times longer than the next most time-consuming operation, which is the persistence calculation of the AlphaComplex which takes place just before.

#### live_prediction.py
* [`live_prediction.py`](live_prediction.py) takes a trained classifier, this script uses the [`tracker.Tracker`](action_recognition/tracker/tracker.py) to yield identified tracks from the tracking of people in the video.
* On each such track, it does post-processing (using [`analysis.PostProcessor`](action_recognition/analysis/post_processor.py)) and then takes the latest 50, 30, 25, and 20 frames as chunks for which actions are predicted. The most likely action (highest probability/confidence from the classifier) from all chunks is selected as the action for the person.
* [`live_prediction.py`](live_prediction.py) takes a trained classifier and uses [`tracker.Tracker`](action_recognition/tracker/tracker.py) to yield identified tracks from the tracking of people in the video.
* On each such track (every 20:th frame), it does post-processing (using [`analysis.PostProcessor`](action_recognition/analysis/post_processor.py)) and then takes the latest 50, 30, 25, and 20 (arbitrarily) frames as chunks for which actions are predicted. The most likely action (highest probability/confidence from the classifier) from all chunks is selected as the action for the person.
* If the confiedence for a classification falls below a user-specifed threshold, the prediction is discarded.
* It also tries to predict if a person moves through e.g. a checkout-area without stopping by identifying if a person moves during several consecutive frames.

## Dockerfiles

There are currently four dockerfiles, corresponding to three natural divisions of dependencies, and one with every dependency:

* [`dockerfiles/Dockerfile-openpose-gpu`](dockerfiles/Dockerfile-openpose-gpu): which is the GPU version of OpenPose, allows the openpose parts of this project to be run.
* [`dockerfiles/Dockerfile-openpose-cpu`](dockerfiles/Dockerfile-openpose-cpu): which is the CPU version of OpenPose.
* [`dockerfiles/Dockerfile-tda`](dockerfiles/Dockerfile-tda): which contains the `Gudhi` and `sklearn_tda` for the classification part of the project.
* [`dockerfiles/Dockerfile-tda`](dockerfiles/Dockerfile-tda): which contains `Gudhi` and `sklearn_tda` for the classification part of the project.
* [`Dockerfile`](Dockerfile): which installs both openpose (assuming a GPU) as well as the TDA libraries. This file can do with some cleanup using build stages.

After building the Dockerfiles, there is a script [`dev.sh`](dev.sh) which runs the container and mounts the source directory as well as the expected locations of the data. It is provided more out of convenience than anything else and may need some modification depending on your configuration.

## Recording videos

There is a helper script for producing timestamps for labels while recording videos. It is called [`record_videos.py`](record_videos.py) and requires a video name, a path to the camera device and video size. It prompts the user in multiple steps: First, asks whether to record video or stop recording. Second, it prompts the user for a label for the timestamp. These steps repeat until the user quits the program. The produced timestamps are read by [`create_dataset.py`](create_dataset.py) to help reduce labelling time.

## Issues with the approach

* A bit slow - OpenPose takes 0.5 s/frame, and the TDA classifier takes 3 s/person and prediction. This time complexity comes mainly from the kernel calculation from sklearn_tda, and the persistence calculation of the gudhi library. Both of these have parameters that can be tuned (see [TDAClassifier.\_pre_validated_pipeline()](action_recognition/classifiers/tda_classifier.py#L113)), at the expense of accuracy.
* We are restricted to 2D positions - a limitation from OpenPose, which makes classification harder.
* OpenPose can be quite jittery, especially when using lower resolutions.
Expand Down
3 changes: 2 additions & 1 deletion action_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@
actions.
"""

__all__ = ['analysis', 'classifiers', 'detector', 'features', 'tracker', 'transforms', 'util']
__all__ = ['analysis', 'augmentors', 'classifiers',
'detector', 'features', 'tracker', 'transforms', 'util']
3 changes: 2 additions & 1 deletion action_recognition/tracker/track.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ def copy(self, number_of_frames):
return new_track

def add_person(self, person, current_frame):
"""Adds a Person to the track
"""Adds a Person to the track.
Parameters
----------
person : Person object
Expand Down
8 changes: 4 additions & 4 deletions doc/action_recognition.analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,31 @@ action\_recognition.analysis package
:show-inheritance:

action\_recognition.analysis.chunk\_visualiser
-----------------------------------------------------
----------------------------------------------

.. automodule:: action_recognition.analysis.chunk_visualiser
:members:
:undoc-members:
:show-inheritance:

action\_recognition.analysis.labelling
---------------------------------------------
--------------------------------------

.. automodule:: action_recognition.analysis.labelling
:members:
:undoc-members:
:show-inheritance:

action\_recognition.analysis.mapper
------------------------------------------
-----------------------------------

.. automodule:: action_recognition.analysis.mapper
:members:
:undoc-members:
:show-inheritance:

action\_recognition.analysis.post\_processor
---------------------------------------------------
--------------------------------------------

.. automodule:: action_recognition.analysis.post_processor
:members:
Expand Down
2 changes: 1 addition & 1 deletion doc/action_recognition.augmentors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ action\_recognition.augmentors package
:show-inheritance:

action\_recognition.augmentors.rotate
--------------------------------------------
-------------------------------------

.. automodule:: action_recognition.augmentors.rotate
:members:
Expand Down
8 changes: 4 additions & 4 deletions doc/action_recognition.classifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,31 @@ action\_recognition.classifiers package
:show-inheritance:

action\_recognition.classifiers.classification\_visualiser
-----------------------------------------------------------------
----------------------------------------------------------

.. automodule:: action_recognition.classifiers.classification_visualiser
:members:
:undoc-members:
:show-inheritance:

action\_recognition.classifiers.ensemble\_classifier
-----------------------------------------------------------
----------------------------------------------------

.. automodule:: action_recognition.classifiers.ensemble_classifier
:members:
:undoc-members:
:show-inheritance:

action\_recognition.classifiers.tda\_classifier
------------------------------------------------------
-----------------------------------------------

.. automodule:: action_recognition.classifiers.tda_classifier
:members:
:undoc-members:
:show-inheritance:

action\_recognition.classifiers.feature\_engineering\_classifier
------------------------------------------------------
----------------------------------------------------------------

.. automodule:: action_recognition.classifiers.feature_engineering_classifier
:members:
Expand Down
4 changes: 2 additions & 2 deletions doc/action_recognition.detector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ action\_recognition.detector package
:show-inheritance:

action\_recognition.detector.caffe\_openpose
---------------------------------------------------
--------------------------------------------

.. automodule:: action_recognition.detector.caffe_openpose
:members:
:undoc-members:
:show-inheritance:

action\_recognition.detector.tf\_openpose
------------------------------------------------
-----------------------------------------

.. automodule:: action_recognition.detector.tf_openpose
:members:
Expand Down
10 changes: 5 additions & 5 deletions doc/action_recognition.features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,39 +7,39 @@ action\_recognition.features package
:show-inheritance:

action\_recognition.features.amount\_of\_movement
--------------------------------------------------------
-------------------------------------------------

.. automodule:: action_recognition.features.amount_of_movement
:members:
:undoc-members:
:show-inheritance:

action\_recognition.features.angle\_change\_speed
--------------------------------------------------------
-------------------------------------------------

.. automodule:: action_recognition.features.angle_change_speed
:members:
:undoc-members:
:show-inheritance:

action\_recognition.features.average\_speed
--------------------------------------------------
-------------------------------------------

.. automodule:: action_recognition.features.average_speed
:members:
:undoc-members:
:show-inheritance:

action\_recognition.features.keypoint\_distance
------------------------------------------------------
-----------------------------------------------

.. automodule:: action_recognition.features.keypoint_distance
:members:
:undoc-members:
:show-inheritance:

action\_recognition.features.feature\_visualiser
------------------------------------------------------
------------------------------------------------

.. automodule:: action_recognition.features.feature_visualiser
:members:
Expand Down
1 change: 1 addition & 0 deletions doc/action_recognition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Subpackages
.. toctree::

action_recognition.analysis
action_recognition.augmentors
action_recognition.classifiers
action_recognition.detector
action_recognition.features
Expand Down
4 changes: 2 additions & 2 deletions doc/action_recognition.tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ action\_recognition.tests package


action\_recognition.tests.test\_track
--------------------------------------------
-------------------------------------

.. automodule:: action_recognition.tests.test_track
:members:
:undoc-members:
:show-inheritance:

action\_recognition.tests.test\_tracker
----------------------------------------------
---------------------------------------

.. automodule:: action_recognition.tests.test_tracker
:members:
Expand Down
8 changes: 4 additions & 4 deletions doc/action_recognition.tracker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,31 @@ action\_recognition.tracker package
:show-inheritance:

action\_recognition.tracker.person
-----------------------------------------
----------------------------------

.. automodule:: action_recognition.tracker.person
:members:
:undoc-members:
:show-inheritance:

action\_recognition.tracker.track
----------------------------------------
---------------------------------

.. automodule:: action_recognition.tracker.track
:members:
:undoc-members:
:show-inheritance:

action\_recognition.tracker.track\_visualiser
----------------------------------------------------
---------------------------------------------

.. automodule:: action_recognition.tracker.track_visualiser
:members:
:undoc-members:
:show-inheritance:

action\_recognition.tracker.tracker
------------------------------------------
-----------------------------------

.. automodule:: action_recognition.tracker.tracker
:members:
Expand Down
Loading

0 comments on commit cf9ec09

Please sign in to comment.