Visual Referee Challenge

For the Visual Referee Challenge at RoboCup 2022, we developed a simple yet effective deep learning model for gesture recognition on short sequences of frames. The augmentation on the training data increased the robustness of the system.

Augmentation and Dataset

Different members of the team and university mimicked the different poses to create a preliminary dataset. A green screen so that the poses can be transferred to different backgrounds. This increased the dataset size and made the model robust to new backgrounds.

Model

The model is based on 3D convolutions with LSTM. The model takes in 15 frames and outputs a probability distribution over the gesture labels. An additional "no pose" dustbin label is included to reduce false positives.

Visual Referee Gesture Recognizer

If the confidence score does not exceed a certain threshold, it takes another set of 15 frames until the confidence threshold is met.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual Referee Challenge

Augmentation and Dataset

Model

Visual Referee Gesture Recognizer

Clone this wiki locally