An implementation of "Deep Biaffine Attention for Neural Dependency Parsing".
Details and hyperparameter choices are almost identical to those described in the paper, except for some training settings. Also, we do not provide a decoding algorithm to ensure well-formedness, and this does not seriously affect the results.
Another version of the implementation is available on char branch, which replaces the tag embedding with char lstm and achieves better performance.
python == 3.7.0
pytorch == 1.0.0
The model is evaluated on the Stanford Dependency conversion (v3.3.0) of the English Penn Treebank with POS tags predicted by Stanford POS tagger.
For all datasets, we follow the conventional data splits:
- Train: 02-21 (39,832 sentences)
- Dev: 22 (1,700 sentences)
- Test: 23 (2,416 sentences)
UAS | LAS | |
---|---|---|
tag embedding | 95.87 | 94.19 |
char lstm | 96.17 | 94.53 |
Note that punctuation is excluded in all evaluation metrics.
Aside from using consistent hyperparameters, there are some keypoints that significantly affect the performance:
- Dividing the pretrained embedding by its standard-deviation
- Applying the same dropout mask at every recurrent timestep
- Jointly dropping the words and tags
For the above reasons, we may have to give up some native modules in pytorch (e.g., LSTM
and Dropout
), and use self-implemented ones instead.
As shown above, our results, especially on char lstm version, have outperformed the offical implementation (95.74 and 94.08).
You can start the training, evaluation and prediction process by using subcommands registered in parser.commands
.
$ python run.py -h
usage: run.py [-h] {evaluate,predict,train} ...
Create the Biaffine Parser model.
optional arguments:
-h, --help show this help message and exit
Commands:
{evaluate,predict,train}
evaluate Evaluate the specified model and dataset.
predict Use a trained model to make predictions.
train Train a model.
Before triggering the subparser, please make sure that the data files must be in CoNLL-X format. If some fields are missing, you can use underscores as placeholders.
Optional arguments of the subparsers are as follows:
$ python run.py train -h
usage: run.py train [-h] [--ftrain FTRAIN] [--fdev FDEV] [--ftest FTEST]
[--fembed FEMBED] [--device DEVICE] [--seed SEED]
[--threads THREADS] [--file FILE] [--vocab VOCAB]
optional arguments:
-h, --help show this help message and exit
--ftrain FTRAIN path to train file
--fdev FDEV path to dev file
--ftest FTEST path to test file
--fembed FEMBED path to pretrained embedding file
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--file FILE, -f FILE path to model file
--vocab VOCAB, -v VOCAB
path to vocabulary file
$ python run.py evaluate -h
usage: run.py evaluate [-h] [--batch-size BATCH_SIZE] [--include-punct]
[--fdata FDATA] [--device DEVICE] [--seed SEED]
[--threads THREADS] [--file FILE] [--vocab VOCAB]
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
batch size
--include-punct whether to include punctuation
--fdata FDATA path to dataset
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--file FILE, -f FILE path to model file
--vocab VOCAB, -v VOCAB
path to vocabulary file
$ python run.py predict -h
usage: run.py predict [-h] [--batch-size BATCH_SIZE] [--fdata FDATA]
[--fpred FPRED] [--device DEVICE] [--seed SEED]
[--threads THREADS] [--file FILE] [--vocab VOCAB]
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
batch size
--fdata FDATA path to dataset
--fpred FPRED path to predicted result
--device DEVICE, -d DEVICE
ID of GPU to use
--seed SEED, -s SEED seed for generating random numbers
--threads THREADS, -t THREADS
max num of threads
--file FILE, -f FILE path to model file
--vocab VOCAB, -v VOCAB
path to vocabulary file