The Manga109 dataset contains artificial images of manga (Japanese comics) and annotations for four categories (body, face, frame, and text). Many characteristics are different from natural images.
The Manga109-s dataset (87 volumes) is a subset of the full Manga109 dataset (109 volumes). Unlike the full Manga109 dataset, the Manga109-s dataset can be used by commercial organizations. For a wide range of users, we conduct experiments on Manga109-s.
Please see this page to download Manga109-s. Please see our manga109api fork to convert the dataset to COCO format. We use 68train, 4val, and 15test splits. The 15test set was selected to be well-balanced for reliable evaluation.
Method | Backbone | Lr schd | AP | Download |
---|---|---|---|---|
Faster R-CNN | R-50 | 1x | 65.8 | model |
Cascade R-CNN | R-50 | 1x | 67.6 | model |
RetinaNet | R-50 | 1x | 65.3 | model |
ATSS | R-50 | 1x | 66.5 | model |
GFL | R-50 | 1x | 67.3 | model |
DETR | R-50 | 1x | 31.2 | model |
Deformable DETR | R-50 | 1x | 64.1 | model |
Sparse R-CNN | R-50 | 1x | 63.1 | model |
ATSS | Swin-T | 1x | 66.2 | model |
ATSS | ConvNeXt-T | 1x | 67.4 | model |
ATSS+SEPC | R-50 | 1x | 67.1 | model |
ATSS+DyHead | R-50 | 1x | 67.9 | model |
YOLOX-L | CSP v5 | 1x | 70.2 | model |
UniverseNet | R2-50 | 1x | 68.9 | model |
UniverseNet 20.08 | R2-50 | 1x | 69.9 | model |
- In addition to ATSS+SEPC, UniverseNet uses Res2Net-v1b-50, DCN, and multi-scale training (480-960).
- The settings for normalization layers (including whether to use iBN of SEPC) depend on the config files.
- Most models were trained and evaluated using fp16 (mixed precision).
- Each model was fine-tuned from a corresponding COCO pre-trained model.
- 15test:
["Akuhamu", "BakuretsuKungFuGirl", "DollGun", "EvaLady", "HinagikuKenzan", "KyokugenCyclone", "LoveHina_vol01", "MomoyamaHaikagura", "TennenSenshiG", "UchiNoNyan'sDiary", "UnbalanceTokyo", "YamatoNoHane", "YoumaKourin", "YumeNoKayoiji", "YumeiroCooking"]
- 4val:
["HealingPlanet", "LoveHina_vol14", "SeisinkiVulnus", "That'sIzumiko"]
- 68train: All the other volumes
- Please check the dataset licenses (Manga109, Manga109-s).
- The typical scale of the original images is (1654, 1170). The number of maximum total pixels of (1216, 864) for Manga109 is almost the same as that of (1333, 800) for COCO.
Users must cite the two papers below for use in academic papers.
@article{mtap_matsui_2017,
author={Yusuke Matsui and Kota Ito and Yuji Aramaki and Azuma Fujimoto and Toru Ogawa and Toshihiko Yamasaki and Kiyoharu Aizawa},
title={Sketch-based Manga Retrieval using Manga109 Dataset},
journal={Multimedia Tools and Applications},
volume={76},
number={20},
pages={21811--21838},
doi={10.1007/s11042-016-4020-z},
year={2017}
}
@article{multimedia_aizawa_2020,
author={Kiyoharu Aizawa and Azuma Fujimoto and Atsushi Otsubo and Toru Ogawa and Yusuke Matsui and Koki Tsubota and Hikaru Ikuta},
title={Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications},
journal={IEEE MultiMedia},
volume={27},
number={2},
pages={8--18},
doi={10.1109/mmul.2020.2987895},
year={2020}
}
Please cite the following paper for the benchmark results. https://arxiv.org/abs/2103.14027
@inproceedings{USB_shinya_BMVC2022,
title={{USB}: Universal-Scale Object Detection Benchmark},
author={Shinya, Yosuke},
booktitle={British Machine Vision Conference (BMVC)},
year={2022}
}