Skip to content

Contains data/code for the paper "Neural Syntactic Preordering for Controlled Paraphrase Generation" (ACL 2020).

Notifications You must be signed in to change notification settings


Repository files navigation



Contains data/code for the paper "Neural Syntactic Preordering for Controlled Paraphrase Generation" (ACL 2020).

Additional data/resources/trained models are available at The link contains training data to train new models, and also trained models for both SOW and REAP.

Environment base is Python 3.6. Also see requirements.txt. We used Stanford CoreNLP version 3.9.1.


The SOW-REAP pipeline generates syntactically diverse paraphrases using 2 steps.

  1. Source Order reWriting: An encoder-decoder model is used to apply tranduction operations over various versions of the input sentence, obtained by abstracting out different pairs of constituents.

  2. REarrangement Aware Paraphrasing: A combination of the transductions are chosen to construct an ordering guide. This ordering informs the generation of the output paraphrases.

Screen Shot 2022-11-29 at 1 54 22 AM

Training new models

  1. Download training data from the google drive. Keep data folder in the main folder. To train the model on custom data, refer to the instructions at
  2. Download resources from the google drive. Keep in the main folder.

Training SOW model

Data files train_sow.hdf5 and dev_sow.hdf5 correspond to the training and dev files for SOW. These contain phrase level inputs and outputs, with exactly two constituents abstracted out. We train a seqq2seq transformer model to learn this transduction. Run the following commands from the project root folder to train the sow model:

export PYTHONPATH=./
python sow/

Training REAP model

Data files train.hdf5 and dev.hdf5 correspond to the training and dev files for REAP. This model learns a seq2seq model to paraphrase the input sentence, additionally conditioned on an input reordering, that indicates the desired order of content in the output sentence. Run the following commands from the project root folder to train the reap model:

export PYTHONPATH=./
python reap/


  1. Download resources from the google drive. Keep in the main folder.
  2. To use trained models, download from the google drive. Change model location in the arguments to the trained model location. Paraphrases can be generated using three schemes:
  3. Baseline seq2seq that does not include any reorder information. Run See sample_test_baseline.txt for sample input file. (Use the PTB tokenizer to tokenize the file before running the system, the sample file included with the code is tokenized already.)
java -mx4g -cp "*" edu.stanford.nlp.process.PTBTokenizer -preserveLines < sample_test_baseline.txt > sample_test_baseline.tok
  1. REAP model with ground truth ordering. See sample_test_gt_reap.txt for sample input file required. The file contains sentence_1, sentence_2, sentence_1_reordering, sentence_2_reordering. See processing/ to generate this sample data (will be added soon).


  1. Full SOW-REAP model. This first produces k reorderings for the input sentence using SOW, then generates a paraphrase correspondig to each of those reorderings (using REAP). See sample_test_sow_reap.txt for sample input file required. We use the stanford nlp parser to generate this. To generate this file for your custom dataset, run the following command (from the stanford corenlp parser folder) on a file with the same input scheme as the sample_test_baseline.tok. IMPORTANT: When using your own test data, tokeinize the file separately (command above) before running the parser, otherwise some of the future code breaks or produces non-sensical outputs.
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,parse  -preserveLines -ssplit.eolonly true -outputFormat text -file sample_test_baseline.txt

The output from the stanford nlp parser serves as an input to our SOW-REAP generator. To generate paraphrases:


The code in this repo uses the transformer implementaion from


Contains data/code for the paper "Neural Syntactic Preordering for Controlled Paraphrase Generation" (ACL 2020).






No releases published


No packages published