Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".
NEWS: We now have a better and tidier implementation integrated into Hugging Face transformers!
If you use this code in your research, please cite our paper:
@inproceedings{zhou2020bert,
author = {Zhou, Wangchunshu and Xu, Canwen and Ge, Tao and McAuley, Julian and Xu, Ke and Wei, Furu},
booktitle = {Advances in Neural Information Processing Systems},
pages = {18330--18341},
publisher = {Curran Associates, Inc.},
title = {BERT Loses Patience: Fast and Robust Inference with Early Exit},
url = {https://proceedings.neurips.cc/paper/2020/file/d4dd111a4fd973394238aca5c05bebe3-Paper.pdf},
volume = {33},
year = {2020}
}
Our code is built on huggingface/transformers. To use our code, you must clone and install huggingface/transformers.
You can fine-tune a pretrained language model and train the internal classifiers by configuring and running finetune_bert.sh
and finetune_albert.sh
.
You can inference with different patience settings by configuring and running patience_infer_albert.sh
and patience_infer_bert.sh
.
If you'd like to contribute and add more tasks (only GLUE is available at this moment), please submit a pull request and contact me. Also, if you find any problem or bug, please report with an issue. Thanks!