NTCIR-STC3-NDDQsubtask

Works of NTCIR14 STC-3 Nugget Detection & Dialogue Quality tasks (link)
Proposed paper: arXiv: 1907.03070

Directories

There are 3 types of models

Using word embedding trained by NTCIR-STC3 corpus, and use softmax function as the final layer (to fit the evaluation of NTCIR-14)
- You may download NTCIR-STC3 word embedding here
Using BERT as sentence embedding, and use softmax function as the final layer
Using BERT as sentence embedding, and use CRF the final layer (ND subtask only)

NTCIR-STC3 word embedding + softmax (Input should be NTCIR-STC3 word embedding format)
- Hierarchical_model.ipynb: Main function
- datahelper.py: Data processing
- nuggetdetection.py: Model and loss function for ND subtask
- dialogquality.py: Model and loss function for DQ subtask
- dialogquality_ndfeature.py: Model and loss function for DQ subtask with ND result as feature
- stc_train.py: Tensorflow graph
BERT sentence embedding + softmax (Input should be BERT sentence embedding format)
- Hierarchical_BERT_model.ipynb: Main function
- datahelper.py: Data processing
- nuggetdetectionBERT.py: Model and loss function for ND subtask
- dialogquality_ndfeatureBERT.py: Model and loss function for DQ subtask with ND result as feature
- dialogqualityBERT.py: Model and loss function for DQ subtask
- stc_trainBERT.py: Tensorflow graph
BERT sentence embedding + CRF (Input should be BERT sentence embedding format)
- Hierarchical_CRF_model.ipynb: Main function
- datahelperCRF.py: Data processing for CRF
- nuggetdetectionCRF.py: Model and loss function for ND subtask (Output: CRF)
- stc_trainCRF.py: Tensorflow graph

Dialogue Quality:
- NMD: Normalised Match Distance.
- RSNOD: Root Symmetric Normalised Order-aware Divergence
Nugget Detection:
- RNSS: Root Normalised Sum of Squared errors
- JSD: Jensen-Shannon divergence

Please check out here