Neural Machine Translation Methods for Sign Language Translation

This repository provides complementary material to aid the replication of the experiments described in the paper Using Neural Machine Translation Methods for Sign Language Translation (Angelova et al., ACL 2022).

Introduction

In this paper, we examine methods and techniques, proven to be helpful for the text-to-text translation of spoken languages in the context of gloss-to-text translation systems, where the glosses are the written representation of the signs. We present one of the first works that include experiments on both parallel corpora of the German Sign Language (PHOENIX14T and the Public DGS Corpus). We experiment with two NMT architectures with optimization of their hyperparameters, several tokenization methods and two data augmentation techniques (back-translation and paraphrasing).

Through our investigation we achieve a substantial improvement of 5.0 and 2.2 BLEU scores for the models trained on the two corpora respectively. Our RNN models outperform our Transformer models, and the segmentation method we achieve best results with is BPE, whereas back-translation and paraphrasing lead to minor but not significant improvements.

Content

The structure of the repository is as following:

notebooks: Python notebooks containing the code for reproducing the experiments,
- splitting training and test set for the DGS corpus
- measuring statistics for the DGS corpus
- extract glosses and german intepretation from the DGS corpus
- custom tokenization of the DGS corpus
- measure overlap between training, dev and test splits of the DGS corpus
training scripts: bash files including SLURM commands to execute Marian-NMT for training the NMT models reported in the paper
data: the plain-text versions of the paraller corpora for both the DGS and the Phoenix corpora. It includes the training, dev and test split for both the Phoenix and DGS corpus. Additionally, one can find tokenized, stemmed and augmented versions as reported in the paper.

Citation

This work has been done with the aim to contribute to the research towards the automatic translation of sign languages. We encourage any further research on top of our work and we are happy to answer related questions. The code is released under the GPL-3.0 License.

If you use the code or derivatives of this repository, please cite:

@inproceedings{angelova-etal-2022-using,
    title = "Using Neural Machine Translation Methods for Sign Language Translation",
    author = {Angelova, Galina  and
      Avramidis, Eleftherios  and
      M{\"o}ller, Sebastian},
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-srw.21",
    pages = "273--284",
}

More citation formats here

Please note that the corpora (Phoenix 2014T and DGS Corpus) have their own licenses and any use of them should be conforming with them and include the appropriate citations.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
notebooks		notebooks
training-scripts		training-scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Machine Translation Methods for Sign Language Translation

Introduction

Content

Citation

About

Releases

Packages

Contributors 2

Languages

License

DFKI-SignLanguage/gloss-to-text-sign-language-translation

Folders and files

Latest commit

History

Repository files navigation

Neural Machine Translation Methods for Sign Language Translation

Introduction

Content

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages