Transformer model for Seq2Seq Machine Translation

Transformer model for Chinese-English translation

Basic Architecture

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C] Advances in neural information processing systems. 2017: 5998-6008.

Data Explanation

The Chinese-English translation data used in this project is just sample data, change them as you like.(Cn-863k, En-1.1M)

Data Format:

sentence-1-word-1 sentence-1-word-2 sentence-1-word-3. [\n]
sentence-2-word-1 sentence-2-word-2 sentence-2-word-3 sentence-2-word-4. [\n]
......

Chinese-English data should be paired.

Installation

Python3.6+ needed.

The following packages are needed:

regex==2018.1.10
terminaltables==3.1.0
torch==1.3.0
numpy==1.14.0
tensorboardX==1.9

Easily, you can install all requirement with:

pip3 install -r requirements.txt

Usage

Modifying hyperparameters

modify hyperparameters in hyperparams.py:

+------------------+---------------------+
| Parameters       | Value               |
+------------------+---------------------+
| source_train     | corpora/cn.txt      |
| target_train     | corpora/en.txt      |
| source_test      | corpora/cn.test.txt |
| target_test      | corpora/en.test.txt |
| batch_size       | 128                 |
| batch_size_valid | 64                  |
| lr               | 0.0002              |
| logdir           | logdir              |
| model_dir        | ./models/           |
| maxlen           | 50                  |
| min_cnt          | 0                   |
| hidden_units     | 512                 |
| num_blocks       | 12                  |
| num_epochs       | 50                  |
| num_heads        | 8                   |
| dropout_rate     | 0.4                 |
| sinusoid         | False               |
| eval_epoch       | 1                   |
| preload          | None                |
| eval_script      | scripts/validate.sh |
| check_frequence  | 10                  |
+------------------+---------------------+

Generating vocabulary

Generating vocabulary for training, run prepro.py:
Training the model

Run train.py, start training model.
Visualize the training process on tensorboard
```
tensorboard --logdir runs
```

Evaluation

The evaluation metric for Chinese-English we use is case-insensitive BLEU. We use the muti-bleu.perl script from Moses to compute the BLEU.

Result on tensorboard:

As the data is too simple, the results are just a reference.

Device

Tested on CPU and Single GPU.

Device Type	Device	Speed
CPU	Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz	4 min 29 sec / Epoch
GPU	GeForce GTX 1080 Ti	48 sec / Epoch

To Do

Train on public dataset
Test script

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
corpora		corpora
figures		figures
log		log
models		models
preprocessed		preprocessed
runs		runs
scripts		scripts
.gitattributes		.gitattributes
AttModel.py		AttModel.py
LICENSE		LICENSE
README.md		README.md
bleu.py		bleu.py
data_load.py		data_load.py
hyperparams.py		hyperparams.py
modules.py		modules.py
prepro.py		prepro.py
requirements.txt		requirements.txt
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer model for Seq2Seq Machine Translation

Basic Architecture

Data Explanation

Installation

Usage

Evaluation

Device

To Do

License

About

Releases

Packages

Languages

License

P3n9W31/transformer-pytorch

Folders and files

Latest commit

History

Repository files navigation

Transformer model for Seq2Seq Machine Translation

Basic Architecture

Data Explanation

Installation

Usage

Evaluation

Device

To Do

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages