TL;DR: New self-supervised learning method for motion prediction in self-driving applications.
Road Barlow Twins. During pre-training, plain map data is used. The pre-training objective is to learn similar embeddings for differently augmented views of the same map data. During fine-tuning, annotated samples with past traffic agent trajectories are used to fine-tune for motion prediction.
We continued this work with a focus on learned token set representations instead of rasterized image representations in this repository fork.
Register and download the dataset from here. Clone this repo and use the prerender script as described in the readme.
The local attention (Beltagy et al., 2020) and cross-attention (Chen et al., 2021) implementations are from lucidrain's vit_pytorch library. The baseline DualMotionViT model builds upon the work by Konev et al., 2022.