Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



53 Commits

Repository files navigation

Biaffine Parser

Travis LICENSE GitHub issues GitHub stars

An implementation of "Deep Biaffine Attention for Neural Dependency Parsing".

Details and hyperparameter choices are almost identical to those described in the paper, except for some training settings. Also, we do not provide a decoding algorithm to ensure well-formedness, and this does not seriously affect the results.

Another version of the implementation is available on char branch, which replaces the tag embedding with char lstm and achieves better performance.


python == 3.7.0
pytorch == 1.0.0


The model is evaluated on the Stanford Dependency conversion (v3.3.0) of the English Penn Treebank with POS tags predicted by Stanford POS tagger.

For all datasets, we follow the conventional data splits:

  • Train: 02-21 (39,832 sentences)
  • Dev: 22 (1,700 sentences)
  • Test: 23 (2,416 sentences)


tag embedding 95.87 94.19
char lstm 96.17 94.53

Note that punctuation is excluded in all evaluation metrics.

Aside from using consistent hyperparameters, there are some keypoints that significantly affect the performance:

  • Dividing the pretrained embedding by its standard-deviation
  • Applying the same dropout mask at every recurrent timestep
  • Jointly dropping the words and tags

For the above reasons, we may have to give up some native modules in pytorch (e.g., LSTM and Dropout), and use self-implemented ones instead.

As shown above, our results, especially on char lstm version, have outperformed the offical implementation (95.74 and 94.08).


You can start the training, evaluation and prediction process by using subcommands registered in parser.commands.

$ python -h
usage: [-h] {evaluate,predict,train} ...

Create the Biaffine Parser model.

optional arguments:
  -h, --help            show this help message and exit

    evaluate            Evaluate the specified model and dataset.
    predict             Use a trained model to make predictions.
    train               Train a model.

Before triggering the subparser, please make sure that the data files must be in CoNLL-X format. If some fields are missing, you can use underscores as placeholders.

Optional arguments of the subparsers are as follows:

$ python train -h
usage: train [-h] [--ftrain FTRAIN] [--fdev FDEV] [--ftest FTEST]
                    [--fembed FEMBED] [--device DEVICE] [--seed SEED]
                    [--threads THREADS] [--file FILE] [--vocab VOCAB]

optional arguments:
  -h, --help            show this help message and exit
  --ftrain FTRAIN       path to train file
  --fdev FDEV           path to dev file
  --ftest FTEST         path to test file
  --fembed FEMBED       path to pretrained embedding file
  --device DEVICE, -d DEVICE
                        ID of GPU to use
  --seed SEED, -s SEED  seed for generating random numbers
  --threads THREADS, -t THREADS
                        max num of threads
  --file FILE, -f FILE  path to model file
  --vocab VOCAB, -v VOCAB
                        path to vocabulary file

$ python evaluate -h
usage: evaluate [-h] [--batch-size BATCH_SIZE] [--include-punct]
                       [--fdata FDATA] [--device DEVICE] [--seed SEED]
                       [--threads THREADS] [--file FILE] [--vocab VOCAB]

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        batch size
  --include-punct       whether to include punctuation
  --fdata FDATA         path to dataset
  --device DEVICE, -d DEVICE
                        ID of GPU to use
  --seed SEED, -s SEED  seed for generating random numbers
  --threads THREADS, -t THREADS
                        max num of threads
  --file FILE, -f FILE  path to model file
  --vocab VOCAB, -v VOCAB
                        path to vocabulary file

$ python predict -h
usage: predict [-h] [--batch-size BATCH_SIZE] [--fdata FDATA]
                      [--fpred FPRED] [--device DEVICE] [--seed SEED]
                      [--threads THREADS] [--file FILE] [--vocab VOCAB]

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        batch size
  --fdata FDATA         path to dataset
  --fpred FPRED         path to predicted result
  --device DEVICE, -d DEVICE
                        ID of GPU to use
  --seed SEED, -s SEED  seed for generating random numbers
  --threads THREADS, -t THREADS
                        max num of threads
  --file FILE, -f FILE  path to model file
  --vocab VOCAB, -v VOCAB
                        path to vocabulary file


Param Description Value
n_embed dimension of word embedding 100
n_tag_embed dimension of tag embedding 100
embed_dropout dropout ratio of embeddings 0.33
n_lstm_hidden dimension of lstm hidden state 400
n_lstm_layers number of lstm layers 3
lstm_dropout dropout ratio of lstm 0.33
n_mlp_arc arc mlp size 500
n_mlp_rel label mlp size 100
mlp_dropout dropout ratio of mlp 0.33
lr starting learning rate of training 2e-3
betas hyperparameter of momentum and L2 norm (0.9, 0.9)
epsilon stability constant 1e-12
annealing formula of learning rate annealing
batch_size number of sentences per training update 200
epochs max number of epochs 1000
patience patience for early stop 100



No releases published


No packages published
