This is the official implementation of our Pattern Recognition(PR) 2024 paper "HTR-VT: Handwritten Text Recognition with Vision Transformer". It's a new and effective baseline for handwritten text recognition solely using Vision Transformer and CTC Loss.
[Project Page] [Paper] [arXiv] [Google Drive]
Our model can be learnt in a single GPU RTX-4090 24G
conda env create -f environment.yml
conda activate htr
The code was tested on Python 3.9 and PyTorch 1.13.0.
- Using IAM, READ2016 and LAM for handwritten text recognition.
IAM
Register at the FKI's webpage :https://fki.tic.heia-fr.ch/databases/iam-handwriting-database)
Download the dataset from here :https://fki.tic.heia-fr.ch/databases/download-the-iam-handwriting-database
READ2016
wget https://zenodo.org/record/1164045/files/{Test-ICFHR-2016.tgz,Train-And-Val-ICFHR-2016.tgz}
LAM
Download the dataset from here: https://aimagelab.ing.unimore.it/imagelab/page.asp?IdPage=46
- Download datasets to ./data/. Take IAM for an example: The structure of the file should be:
./data/iam/
├── train.ln
├── val.ln
├── test.ln
└── lines
├──a01-000u-00.png
├──a01-000u-00.txt
├──a01-000u-01.png
├──a01-000u-01.txt
...
- We provide convenient and comprehensive commands in ./run/ to train and test on different datasets to help researchers reproducing the results of the paper.
If our project is helpful for your research, please consider citing :
@article{li2024htr,
title={HTR-VT: Handwritten text recognition with vision transformer},
author={Li, Yuting and Chen, Dexiong and Tang, Tinglong and Shen, Xi},
journal={Pattern Recognition},
pages={110967},
year={2024},
publisher={Elsevier}
}
We appreciate helps from public code: VAN and OrigamiNet.