Skip to content

Visual Referee Challenge

Zichong Li edited this page Feb 15, 2023 · 1 revision

For the Visual Referee Challenge at RoboCup 2022, we developed a simple yet effective deep learning model for gesture recognition on short sequences of frames. The augmentation on the training data increased the robustness of the system.

Augmentation and Dataset

Different members of the team and university mimicked the different poses to create a preliminary dataset. A green screen so that the poses can be transferred to different backgrounds. This increased the dataset size and made the model robust to new backgrounds.

Model

The model is based on 3D convolutions with LSTM. The model takes in 15 frames and outputs a probability distribution over the gesture labels. An additional "no pose" dustbin label is included to reduce false positives.

Visual Referee Gesture Recognizer

If the confidence score does not exceed a certain threshold, it takes another set of 15 frames until the confidence threshold is met.

Clone this wiki locally