Works of NTCIR14 STC-3 Nugget Detection & Dialogue Quality tasks (link)
Proposed paper: arXiv: 1907.03070
- /bert: Source code of BERT
- /BertPretrainModel: Pretrained model of BERT
- /PickleCorpus: NTCIR-STC3 corpus preprocessed into embeddings
- /PickleBert: NTCIR-STC3 embeddings preprocessed into BERT format
- /PickleResult: The result of ND&DQ subtask
- bert.ipynb: Preprocess NTCIR-STC3 embeddings to BERT format
- param.py: parameters for models
- stcevaluation.py: Evaluation methods provided by NTCIR-14
- stctokenizer.py: Tokenizer for this task
There are 3 types of models
- Using word embedding trained by NTCIR-STC3 corpus, and use softmax function as the final layer (to fit the evaluation of NTCIR-14)
- You may download NTCIR-STC3 word embedding here
- Using BERT as sentence embedding, and use softmax function as the final layer
- Using BERT as sentence embedding, and use CRF the final layer (ND subtask only)
-
NTCIR-STC3 word embedding + softmax (Input should be NTCIR-STC3 word embedding format)
- Hierarchical_model.ipynb: Main function
- datahelper.py: Data processing
- nuggetdetection.py: Model and loss function for ND subtask
- dialogquality.py: Model and loss function for DQ subtask
- dialogquality_ndfeature.py: Model and loss function for DQ subtask with ND result as feature
- stc_train.py: Tensorflow graph
-
BERT sentence embedding + softmax (Input should be BERT sentence embedding format)
- Hierarchical_BERT_model.ipynb: Main function
- datahelper.py: Data processing
- nuggetdetectionBERT.py: Model and loss function for ND subtask
- dialogquality_ndfeatureBERT.py: Model and loss function for DQ subtask with ND result as feature
- dialogqualityBERT.py: Model and loss function for DQ subtask
- stc_trainBERT.py: Tensorflow graph
-
BERT sentence embedding + CRF (Input should be BERT sentence embedding format)
- Hierarchical_CRF_model.ipynb: Main function
- datahelperCRF.py: Data processing for CRF
- nuggetdetectionCRF.py: Model and loss function for ND subtask (Output: CRF)
- stc_trainCRF.py: Tensorflow graph
- Dialogue Quality:
- NMD: Normalised Match Distance.
- RSNOD: Root Symmetric Normalised Order-aware Divergence
- Nugget Detection:
- RNSS: Root Normalised Sum of Squared errors
- JSD: Jensen-Shannon divergence
Please check out here