Skip to content

Latest commit

 

History

History
46 lines (35 loc) · 1.8 KB

README.md

File metadata and controls

46 lines (35 loc) · 1.8 KB

Distilling Knowledge for Fast Retrieval-based Chat-bots

Implementation of "Distilling Knowledge for Fast Retrieval-based Chat-bots" (SIGIR 2020) using deep matching transformer networks and knowledge distillation for response retrieval in information-seeking conversational systems.

Prepocessing a dataset:

cd ./Datasets/ubuntu_data/
ipython ./ubuntu_preprocess.py

The ubuntu dataset can be found here.

Training a Bi-Encoder using cross entropy loss:

ipython ./main_bert_bi.py -- --dataset ubuntu_data --batch 16 --nmodel BiEncoderDot2 --epochs 1 

Training an enchanced Cross-Encoder (BECA) using cross entropy loss:

ipython ./main_bert_bi.py -- --dataset ubuntu_data --batch 8 --nmodel MolyEncoderAggP2 --epochs 1

Training a Bi-Encoder using cross-entropy loss and knowledge distillation:

 ipython ./main_bi_kd.py -- --dataset ubuntu_data --batch 8 --student_batch 16 --alpha 0.5  --epochs 1  --load 1 --nmodel MolyEncoderAggP2 --nstudentmodel BiEncoderDot2

This uses the outputs of a previously trained enhanced cross encoder (BECA) when training the Bi-Encoder. Be careful to match the hyper-parameters between the two runs otherwise you'll get file not found errors.

You can cite the paper as:

Amir Vakili Tahami, Kamyar Ghajar, Azadeh Shakery. Distilling Knowledge for Fast Retrieval-based Chat-bots. In Proceedings of the
43th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2020).

Bibtext
 @article{tahami2020distilling,
    title={Distilling Knowledge for Fast Retrieval-based Chat-bots},
    author={Amir Vakili Tahami and Kamyar Ghajar and Azadeh Shakery},
    year={2020},
    eprint={2004.11045},
    archivePrefix={arXiv},
}