Cross-Domain-CWS

About

A TensorFlow implementation of IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation".

Model Overview

Figure: Architecture of our proposed model. It mainly includes three components: the forward language model (pink), backward language model (yellow), and BiLSTM segmentation model (blue). We use a gate mechanism to control the influence of the language models on the segmentation model. The outputs of language models are not shown for simplicity. In this example, we assume that “c1c2c3” is a word.

Requirements

Python: 2.7
TensorFlow >= 1.4.1 (The used version for experiments in our paper is 1.4.1)

How to run

Bulid vocabulary:
```
python utils_data.py
```

Train a model:

python train.py --model lstmlm --source ctb --target zx --pl True --memory 1.0

Test a model:

python test.py --model lstmlm --source ctb --target zx --pl True --memory 1.0

Evaluate a model:
```
python eval.py ctb zx lstmlm_ctb_True
```

Citation

If you find the code helpful, please cite the following paper:

Lujun Zhao, Qi Zhang, Peng Wang and Xiaoyu Liu, Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation, In Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18), July 9-19, 2018, Stockholm, Sweden.

@InProceedings{zhao2018cws,
  author    = {Zhao, Lujun and Zhang, Qi and Wang, Peng and Liu, Xiaoyu},
  title     = {Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation},
  booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18)},
  year      = {2018},
  address   = {Stockholm, Sweden}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
dataset		dataset
doc		doc
evaluation		evaluation
model		model
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
model.py		model.py
test.py		test.py
train.py		train.py
utils.py		utils.py
utils_data.py		utils_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Domain-CWS

About

Model Overview

Requirements

How to run

Citation

About

Releases

Packages

Languages

License

egrcc/Cross-Domain-CWS

Folders and files

Latest commit

History

Repository files navigation

Cross-Domain-CWS

About

Model Overview

Requirements

How to run

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages