WARNING: this dataset might contain nudity
This project contains two sections: the mods with which you can create and record game sequences and the scripts necessary to build the dataset. The former section was a readaptation of the JTA-Mods by fabbrimatteo while the latter was created starting from the scripts in the JTA-Dataset repository by fabbrimatteo. The original works focus on pedestrian pose estimation and tracking in urban scenarios while this project concerns pedestrian re-identification. I used this dataset for my master thesis.
This synthetic dataset was generated by exploiting the graphic engine of Grand Theft Auto V. There are 538 individuals captured in several urban scenarios with a total of 19 different camera positions (both overlapping and non-overlapping). Every pedestrian has been recorded at least by 2 cameras with an average of 3.5 up to 5 cameras. The video sequences have been recorded both during day and night, with different kinds of weather ( blizzard, rain, clear). The pedestrians, therefore, could have been recorded with different illumination and weather settings. The assumption is that each identity maintains the same clothing in all the videos. The total number of bounding boxes is 94312 and it ranges from 29 to 496 for each pedestrian. For more statistics you can see this file.
To simulate a realistic scenario, the generated bounding boxes are not "perfect" an can include bodyparts of other peds or entire peds in the background. The bounding boxes also include ped occlusions, they can appear "pixelated" if they were recorded far away from the camera and the scenarios are relatively crowded. Below there are some examples.
Some of the individuals share similar clothes, even if they represent different identities. This happpens especially for those who wear a uniform or a suit and in low illumination settings. In the picture below, each column represent a couple of peds with similar appearance.
There are different versions of this dataset:
- Download the raw frames (and joint information) from our recorded game sequences here
- Dowload the dataset with the selected bounding boxes for each individual here
- Download the dataset with the selected bounding boxes splitted for training and testing here
In case you would like to record your own scenarios or select different bounding boxes from our recorded sequences, visit the home of our wiki.
A scenario is a particolar location in the map of the game. A sequence is a recorded video that can contain up to 2 camera views (recording at 1920x1080, 30 FPS each) and it can be splitted up to 2 videos. A sequence can be recorded both at day time and at night time with the same pedestrians. A camera is identified by its coordinates and axis rotation.
The following example exaplains the bounding box naming: 42647445_sd000c1_0396.jpeg
42647445
is the ped identifiersd
means that the sequence was recorded during day time (sn
for night)000
is the sequence numberc1
means that this bounding box was recorded by the camera 1 in this sequence (there can be up to 2 cameras in each sequence)0396
is the frame number
In the dataset with the training-testing splits, instead of c1
or c2
you will find the unique identifier of the camera (from 1 to 19), such as c08
. There is a script that transforms the dataset from the first namining convention to the second one.