Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conv_seq_to_seq #431

Merged
merged 5 commits into from
Nov 14, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 50 additions & 1 deletion conv_seq_to_seq/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,50 @@
[TBD]
# Convolutional Sequence to Sequence Learning
This model implements the work in the following paper:

Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017

# Training a Model
- Modify the following script if needed and then run:

```bash
python train.py \
--train_data_path ./data/train_data \
--test_data_path ./data/test_data \
--src_dict_path ./data/src_dict \
--trg_dict_path ./data/trg_dict \
--enc_blocks "[(256, 3)] * 5" \
--dec_blocks "[(256, 3)] * 3" \
--emb_size 256 \
--pos_size 200 \
--drop_rate 0.1 \
--use_gpu False \
--trainer_count 1 \
--batch_size 32 \
--num_passes 20 \
>train.log 2>&1
```

# Inferring by a Trained Model
- Infer by a trained model by running:

```bash
python infer.py \
--infer_data_path ./data/infer_data \
--src_dict_path ./data/src_dict \
--trg_dict_path ./data/trg_dict \
--enc_blocks "[(256, 3)] * 5" \
--dec_blocks "[(256, 3)] * 3" \
--emb_size 256 \
--pos_size 200 \
--drop_rate 0.1 \
--use_gpu False \
--trainer_count 1 \
--max_len 100 \
--beam_size 1 \
--model_path ./params.pass-0.tar.gz \
1>infer_result 2>infer.log
```

# Notes

Currently, the beam search will forward the whole network when predicting every word, which is a waste of time. And we will fix it later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beam search will forward the encoder multiple times when predicting each target word, which requires extra computations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @lcy-seso