This repository contains code for the paper "Neural Question Generation from Text: A Preliminary Study"
The experiments in the paper were done with an in-house deep learning tool. Therefore, we re-implement this with PyTorch as a reference.
This code only implements the setting NQG+
in the paper.
Within 1 hour's training on Tesla P100, the NQG+
model achieves 12.78 BLEU-4 score on the dev set.
If you find this code useful in your research, please consider citing:
@article{zhou2017neural,
title={Neural Question Generation from Text: A Preliminary Study},
author={Zhou, Qingyu and Yang, Nan and Wei, Furu and Tan, Chuanqi and Bao, Hangbo and Zhou, Ming},
journal={arXiv preprint arXiv:1704.01792},
year={2017}
}
Make an experiment home folder for NQG data and code:
NQG_HOME=~/workspace/nqg
mkdir -p $NQG_HOME/code
mkdir -p $NQG_HOME/data
cd $NQG_HOME/code
git clone https://github.com/magic282/NQG.git
cd $NQG_HOME/data
wget https://res.qyzhou.me/redistribute.zip
unzip redistribute.zip
Put the data in the folder $NQG_HOME/code/data/giga
and organize them as:
nqg
├── code
│ └── NQG
│ └── seq2seq_pt
└── data
└── redistribute
├── QG
│ ├── dev
│ ├── test
│ ├── test_sample
│ └── train
└── raw
Then collect vocabularies:
python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
$NQG_HOME/data/redistribute/QG/train/train.txt.source.txt \
$NQG_HOME/data/redistribute/QG/train/train.txt.target.txt \
$NQG_HOME/data/redistribute/QG/train/vocab.txt
python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
$NQG_HOME/data/redistribute/QG/train/train.txt.bio \
$NQG_HOME/data/redistribute/QG/train/bio.vocab.txt
python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \
$NQG_HOME/data/redistribute/QG/train/train.txt.pos \
$NQG_HOME/data/redistribute/QG/train/train.txt.ner \
$NQG_HOME/data/redistribute/QG/train/train.txt.case \
$NQG_HOME/data/redistribute/QG/train/feat.vocab.txt
head -n 20000 $NQG_HOME/data/redistribute/QG/train/vocab.txt > $NQG_HOME/data/redistribute/QG/train/vocab.txt.20k
nltk scipy numpy pytorch
PyTorch version: This code requires PyTorch v0.4.0.
Python version: This code requires Python3.
Warning: Older versions of NLTK have a bug in the PorterStemmer. Therefore, a fresh installation or update of NLTK is recommended.
A Docker image is also provided.
docker pull magic282/pytorch:0.4.0
The file run.sh
is an example. Modify it according to your configuration.
bash $NQG_HOME/code/NQG/seq2seq_pt/run_squad_qg.sh $NQG_HOME/data/redistribute/QG $NQG_HOME/code/NQG/seq2seq_pt
nvidia-docker run --rm -ti -v $NQG_HOME:/workspace magic282/pytorch:0.4.0
Then inside the docker:
bash code/NQG/seq2seq_pt/run_squad_qg.sh /workspace/data/redistribute/QG /workspace/code/NQG/seq2seq_pt