Torch code for Parapharse Question Generation. For more information, please refer the paper
This code is written in Lua and requires Torch. The preprocssinng code is in Python, and you need to install NLTK if you want to use NLTK to tokenize the question.
- pip install nltk
You also need to install the following package in order to sucessfully run the code.
We have prepared everything for you ;)
We have referred neuraltalk2 and Text-to-Image Synthesis to prepare our code base.
The first thing you need to do is to download the Quora Question Pairs dataset from the Quora Question Pair website and put the same in the data
folder. Now we need to do some preprocessing, head over to the prepro
folder and run
$ python quora_prepro.py
Note The code given above generates json files for 100K question pairs for train, 5k question pairs for validation and 30K question pairs for Test set.
If you want to change this and instead use only 50K question pairs for training and rest remaining the same, then you need to make some minor changes in the above code. After this step, it will generate the files under the data
folder. quora_raw_train.json
, quora_raw_val.json
and quora_raw_test.json
$ python prepro_quora.py --input_train_json ../data/quora_raw_train.json --input_test_json ../data/quora_raw_test.json
to get the question features. This will generate two files in data/
folder, quora_data_prepro.h5
and quora_data_prepro.json
.
We have everything ready to train the Question paraphrase model. Back to the root directory
th train.lua -input_ques_h5 data/quora_data_prepro.h5 -input_json data/quora_data_prepro.json
In root folder run
th eval.lua -input_ques_h5 data/quora_data_prepro.h5 -input_json data/quora_data_prepro.json
To Evaluate Question paraphrase, you need to download the evaluation tool. To evaluate Questio Pair , you can use script myeval.py
under coco-caption/
folder. If you need to evaluate based on Bleu,Meteor, Rouge and Cider score . Follow All the instruction from this link here
This code is taken from the OpenNMT repo
Step1: Put the results checkpoint json file inside the folder check_point_json
Step2: Rename the check point json file to resuts_json
Step3: Rename the ground truth json file to quora_prepro_test_updated_int_4k
Step4: Run the ./score.sh file
Download all the data files from here.
- quora_data_prepro.h5
- quora_data_prepro.json
- quora_raw_train.json
- quora_raw_val.json
- quora_raw_test.json
The pre-trained model can be download here.
If you use this code as part of any published research, please acknowledge the following paper
@article{patro2018learning,
title={Learning Semantic Sentence Embeddings using Pair-wise Discriminator},
author={Patro, Badri N and Kurmi, Vinod K and Kumar, Sandeep and Namboodiri, Vinay P},
journal={arXiv preprint arXiv:1806.00807},
year={2018}
}