Dependency Parsing as Head Selection

This is an implmentation of the DeNSe (Dependency Neural Selection) parser described in Dependency Parsing as Head Selection

Dependencies

You may also need to install some torch components.

luarocks install nn
luarocks install nngraph
luarocks install cutorch
luarocks install cunn

The parser was developed with an old version of torch (the version around Feb 2016).

Run the Parser

The parser can parse text in conllx format (note that POS tags must be provided). If the gold standard file is provided via --gold, the parse will also print out the UAS and LAS.

CUDA_VISIBLE_DEVICES=3 th dense_parser.lua --modelPath $model --classifierPath $classifier \
    --input $input --output $output --gold $input --mstalg Eisner

Feel free to try scripts in experiments/run_parser.

Get Train Dev Splits for German and Czech

Please refer to the main function of conllx_scripts/split_dev.lua

Convert pre-trained embeddings

You need to convert glove vectors from text format to t7 format.

conllx_scripts/extract_embed.lua -h

Train an Unlabeled Parser

Without loss of generality, we use Czech as an example.

First, train the model with Adam algorithm using the script experiments/czech/train.sh

CUDA_VISIBLE_DEVICES=$ID th train.lua --useGPU \
    --model SelectNetPos \
    --seqLen 112 \
    --maxTrainLen 110 \
    --freqCut 1 \
    --nhid 300 \
    --nin 300 \
    --nlayers 2 \
    --dropout 0.35 \
    --recDropout 0.1 \
    --lr $lr \
    --train $train \
    --valid $valid \
    --test $test \
    --optimMethod Adam \
    --save $model \
    --batchSize 20 \
    --validBatchSize 20 \
    --maxEpoch 15 \
    --npin 40 \
    --evalType conllx \
    | tee $log

After we reach the convergence of Adam, we switch to plain SGD using experiments/czech/tune.sh, which can usually give us a slight improvement.

CUDA_VISIBLE_DEVICES=$ID th post_train.lua \
    --load $load \
    --save $model \
    --lr $lr \
    --maxEpoch 10 \
    --optimMethod SGD \
    | tee $log

Lastly, we use a MST algorithm to adjust the non-tree outputs with experiments/czech/mst-post.sh

CUDA_VISIBLE_DEVICES=3 th mst_postprocess.lua \
    --modelPath $model \
    --mstalg ChuLiuEdmonds \
    --validout $validout \
    --testout $testout | tee $log

Train a labeled Parser

Based on the trained unlabled parser, we first generate training data for the labeled parser with experiments/czech/gen_lbl_train.sh

CUDA_VISIBLE_DEVICES=3 th train_labeled.lua --mode generate \
	--modelPath $model \
	--outTrainDataPath $outTrain \
	--inTrain $inTrain \
	--inValid $inValid \
	--inTest $inTest \
	--outValid $outValid \
	--outTest $outTest \
	--language Other | tee $log

Then we train the labeled parser actually an MLP with experiments/czech/run_lbl.sh

CUDA_VISIBLE_DEVICES=3 th train_labeled.lua --mode train \
	--useGPU \
	--snhids "1880,800,800,82" \
	--activ relu \
	--lr 0.01 \
	--optimMethod AdaGrad \
	--dropout 0.5 \
	--inDropout 0.05 \
	--batchSize 256 \
	--maxEpoch 20 \
	--ftype "|x|xe|xpe|" \
	--dataset $dataset \
	--inTrain $inTrain \
	--inValid $inValid \
	--inTest $inTest \
	--language Other \
	--save $model | tee $log

Downloads

Pre-trained Models

https://drive.google.com/file/d/0B6-YKFW-MnbOVjlQNmlYWTFPT2c/view?usp=sharing

Pre-trained Chinese Embeddings

https://drive.google.com/file/d/0B6-YKFW-MnbOMjdXSVlKTkFwR0E/view?usp=sharing

Citation

@InProceedings{zhang-cheng-lapata:2017:EACLlong,
  author    = {Zhang, Xingxing  and  Cheng, Jianpeng  and  Lapata, Mirella},
  title     = {Dependency Parsing as Head Selection},
  booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {665--676},
  url       = {http://www.aclweb.org/anthology/E17-1063}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
conllx_scripts		conllx_scripts
dataiter		dataiter
experiments		experiments
graph_alg		graph_alg
layers		layers
nnets		nnets
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dense_parser.lua		dense_parser.lua
init.lua		init.lua
model_opts.lua		model_opts.lua
mst_postprocess.lua		mst_postprocess.lua
post_train.lua		post_train.lua
train.lua		train.lua
train_labeled.lua		train_labeled.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependency Parsing as Head Selection

Dependencies

Run the Parser

Get Train Dev Splits for German and Czech

Convert pre-trained embeddings

Train an Unlabeled Parser

Train a labeled Parser

Downloads

Pre-trained Models

Pre-trained Chinese Embeddings

Citation

About

Releases

Packages

Languages

License

XingxingZhang/dense_parser

Folders and files

Latest commit

History

Repository files navigation

Dependency Parsing as Head Selection

Dependencies

Run the Parser

Get Train Dev Splits for German and Czech

Convert pre-trained embeddings

Train an Unlabeled Parser

Train a labeled Parser

Downloads

Pre-trained Models

Pre-trained Chinese Embeddings

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages