This repository has been archived by the owner on Jun 15, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathanet_entities_skeleton.txt
62 lines (56 loc) · 2.95 KB
/
anet_entities_skeleton.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Format of JSON ActivityNet-Entities annotation files
### for anet_entities_cleaned_class_thresh50_*.json
-> vocab: the 431 object classes (not including the background class)
-> annotations
-> [video name]: identifier of video
-> duration: duration of video
-> segments
-> [segment id]: segment from video with bounding box annotations
-> timestamps: start and end timestamps of segment
-> process_clss: object class of all the bounding boxes
-> tokens: tokenized sentence
-> frame_ind: frame index of all the bounding boxes
-> process_idx: the index of the object class in the sentence
-> process_bnd_box: coordinate of all the bounding boxes (xtl, ytl, xbr, ybr)
-> crowds: whether the box represents a group of objects
### for anet_entities_trainval.json
-> database
-> [video name]: identifier of video
- rwidth: resized width of video, will be 720px
- rheight: resized height of video, maintains aspect ratio
-> segments
-> [segment id]: segment from video with bounding box annotations
-> objects
-> [object number]: annotated object from segment
-> noun_phrases: a list of noun phrase (NP) annotations of the object, both the text and the index of the word in the sentence
- frame_ind: frame index (0-9, we divide each video evenly into 10 clips and sample the middle frame of each clip)
- xtl: x coordinate of top left corner of bounding box (between 0 and 720)
- ytl: y coordinate of top left corner of bounding box
- xbr: x coordinate of bottom right corner of bounding box (between 0 and 720)
- ybr: y coordinate of bottom right corner of bounding box
- crowds: whether the box represents a group of objects
### an example on grounding evaluation subsmission files
Evaluation server on Challenge 2020:
For Sub-task I - GT Captions: https://competitions.codalab.org/competitions/24336
For Sub-task II - Generated Captions: https://competitions.codalab.org/competitions/24334
Evaluation server (general): https://competitions.codalab.org/competitions/22835
Submissions need to be named as submission_gt.json (for GT mode) or submission_gen.json (for gen mode) and zipped into a single submission.zip file.
See instructions on the evaluation server for more details.
```
{
"results": {
"v_QOlSCBRmfWY": {
"0": { # segment id
"clss": ["room", "woman", "she"], # object class
"idx_in_sent": [8, 2, 12], # index of object in the sentence
"bbox_for_all_frames": [[[1,2,3,4], …, [1,2,3,4]], [[1,2,3,4], …, [1,2,3,4]], [[1,2,3,4], …, [1,2,3,4]]] # predicted bbox on all 10 uniformly sampled frames
}
}
},
"eval_mode": 'gen' | 'GT' # depending on whether the results are on GT sentences or generated sentences
"external_data": {
"used": true, # Boolean flag
"details": "Object detector pre-trained on Visual Genome on object detection task."
}
}
```