Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update master to Sockeye 2 #822

Merged
merged 141 commits into from
Jun 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
141 commits
Select commit Hold shift + click to select a range
f5e9ec7
Initial commit of Sockeye 2.0 based on Gluon
fhieber Jun 7, 2019
64a6714
Delete image captioning code
fhieber Jun 7, 2019
8472cef
Fix test_fixed_param_strategy test
fhieber Jun 7, 2019
198e5b7
Fix test in test_arguments.py
fhieber Jun 7, 2019
ddb69b8
Remove test_attention, fix test_average
fhieber Jun 7, 2019
fd1a89e
Fix test_bleu and update sacrebleu to 1.3.5
fhieber Jun 7, 2019
486c596
Fix test_config
fhieber Jun 7, 2019
335d8da
Cleanup test_constraints
fhieber Jun 7, 2019
4bdb514
Remove test_coverage. Partially fix test_data_io
fhieber Jun 7, 2019
40c9afe
Removed RNN, CNN encoder/decoder
fhieber Jun 7, 2019
2a6ac0e
Remove breaking test code
fhieber Jun 7, 2019
49d75e1
Update some transformer tests
fhieber Jun 7, 2019
97901b4
update more tests, removed outdated tests
fhieber Jul 9, 2019
26bc186
Updated OutputLayer to support vocabulary selection. updated correspo…
fhieber Jul 10, 2019
d79e345
Fix tests related to get_max_output_len at inference time. Fixed a re…
fhieber Jul 10, 2019
b3a86da
Fix mock in test
fhieber Jul 10, 2019
b9544d3
Disable non-transformer integration tests
fhieber Jul 10, 2019
f2a8b8e
Revise scoring code to make integration tests pass. Moved load_models…
fhieber Jul 11, 2019
8380af0
Fix changelog version
fhieber Jul 11, 2019
357ba06
Fix LHUC and tests
fhieber Jul 11, 2019
bd1fe19
Merge branch 'master' into sockeye_2
fhieber Jul 11, 2019
16dba01
Rework scoring
fhieber Jul 11, 2019
e2bd484
Fix edge case with batch*beam == 1
fhieber Jul 11, 2019
c8934ea
Fix loading of translator and model in CheckpointDecoder
fhieber Jul 11, 2019
fce3c50
Fix none parsing in metrics file
fhieber Jul 11, 2019
f767ca3
use np.allclose
fhieber Jul 11, 2019
6c7cad3
Fix various secondary CLIs, test_other_clis now passes
fhieber Jul 11, 2019
63d45c8
Remove old cli arguments related to RNN/CNN. Remove initialization wi…
fhieber Jul 12, 2019
2a4a06d
Updated integration tests to cover more features with transformer model
fhieber Jul 12, 2019
71cf1ad
Remove unused WIP BeamSeach class for now
fhieber Jul 12, 2019
9c55986
Address mypi errors
fhieber Jul 12, 2019
fdf911f
Copy parallel code from gluonnlp to remove dependency
fhieber Jul 12, 2019
18e8d61
update test_loss.py
fhieber Jul 12, 2019
e1f9782
2nd constraint integration test passes with more updates. First one s…
fhieber Jul 12, 2019
7469551
print fix
fhieber Jul 12, 2019
83f3a87
Adressed a TODO w.r.t outputting translator scores
fhieber Jul 12, 2019
e5d7e34
Remove non-transformer system tests from travis.yml
fhieber Jul 12, 2019
2a2d40e
Delete old system tests
fhieber Jul 12, 2019
5e66797
Rename dummy test loss to avoid warning. Change test_constraints inte…
fhieber Jul 14, 2019
c43503d
Add alternative WIP loss implementation with label smoothing. Signifi…
fhieber Jul 15, 2019
5f0e5be
Remove old mxnet=1.3 code branch, cleanup in transformer.py
fhieber Jul 15, 2019
41407b7
Removed a few old TODOs
fhieber Jul 15, 2019
bba7e7a
More cleanup. Renamed load/save params methods in model.py after Gluo…
fhieber Jul 15, 2019
ad12091
Merge branch 'master' into sockeye_2
fhieber Jul 19, 2019
672d228
Update to MXNET 1.5.0
fhieber Jul 22, 2019
34086fc
fix numpy version
fhieber Jul 22, 2019
225f157
Compatibility with numpy>=1.16
fhieber Jul 22, 2019
4745231
Remove image captioning files
fhieber Jul 22, 2019
c768058
Fix Travis build by sorted test assertion, disabled constrained decod…
fhieber Jul 22, 2019
d2252f8
Renamed --max-input-len to --max-input-length. Added --max-output-len…
fhieber Jul 25, 2019
9046995
Re-enable constrained decoding integration tests
fhieber Jul 25, 2019
b368c4d
Fix test_arguments
fhieber Jul 25, 2019
885494e
Disable test_constraints_int. again...
fhieber Jul 25, 2019
38e7d94
Removed Python3.4 support
fhieber Jul 26, 2019
639d31c
Update seqcopy tutorial
fhieber Jul 28, 2019
4d28e0f
Fix FP16 training: not casting inputs to float16 due to limited fp16 …
fhieber Aug 1, 2019
d0bde1b
Add small TODO
fhieber Aug 5, 2019
ece002d
FP16 training: also avoid casting validation data, set MXNET_SAFE_ACC…
mjdenkowski Aug 2, 2019
1c5b27a
inference dtype inferred from model. dtype now stored in ModelConfig.…
fhieber Aug 5, 2019
67cf6c9
fix test_arguments.py
fhieber Aug 5, 2019
3796de0
Do not cast previous word to fp16 at inference
fhieber Aug 5, 2019
66bac22
Fix fp16 decoding: source ids were represented in fp16. Also made var…
fhieber Aug 5, 2019
e03acd3
Actually store dtype in model config
fhieber Aug 5, 2019
72c1f36
Use float32 for source and source_length at inference time to support…
fhieber Aug 5, 2019
20aa393
Move output layer call into decode_step interface function
fhieber Aug 6, 2019
abc8c84
Remove attention matrices from beam search, alignment visualization f…
fhieber Aug 7, 2019
1044bba
Sockeye 2 Training Update (#712)
mjdenkowski Aug 8, 2019
c40d173
Sockeye 2 cpdecoder (#711)
tdomhan Aug 8, 2019
d171579
Training time limit for Sockeye 2 (#716)
fhieber Aug 12, 2019
d7c3751
Only create checkpoint decoder for horovod primary worker (#717)
fhieber Aug 12, 2019
dc4c9fe
Port custom metrics logger to sockeye_2 (#714)
fhieber Aug 12, 2019
248ca88
Update license headers for 2019
fhieber Aug 27, 2019
3271ece
Revised and refactored beam search (#719)
fhieber Aug 29, 2019
26cbc97
More verbose message about target token counts (#721)
artemsok Aug 29, 2019
acb0815
Sockeye 2 Documentation Update (#722)
mjdenkowski Aug 29, 2019
9c892ec
Sockeye 2 Training Update (#723)
mjdenkowski Aug 29, 2019
97cfce9
Updated README.md with publications. Added a few TODOs (#724)
fhieber Aug 30, 2019
7dcbf29
Training: support save/load state with AMP (#725)
mjdenkowski Aug 31, 2019
6d2e1a3
Better fix for AMP and checkpoints (#726)
mjdenkowski Sep 4, 2019
89df1f5
Revert "Revised and refactored beam search (#719)"
mjdenkowski Sep 11, 2019
63f024a
Sockeye 2: Horovod Update and Minor Fixes (#728)
mjdenkowski Sep 12, 2019
4466d8d
Fix: zero means keep all checkpoints
mjdenkowski Sep 15, 2019
b62078c
Update metrics plotting script
mjdenkowski Sep 26, 2019
ff22b6a
Handle CUDA errors when checking number of GPUs
mjdenkowski Sep 26, 2019
eeb7483
Fix pylint errors
mjdenkowski Sep 26, 2019
6a5c63a
Threshold-based stopping (zero by default) (#730)
mjdenkowski Sep 27, 2019
b9e6632
Revised and refactored beam search (#719)
fhieber Aug 29, 2019
e0fa81a
Fix: Use the sorted model states in beam search.
tdomhan Oct 1, 2019
f69e030
Merge pull request #731 from awslabs/sockeye_2_beam_search_fix
tdomhan Oct 2, 2019
f23e3c5
Fix link to MXNet gluon API (#736)
kpuatamazon Oct 18, 2019
efab722
Test selection through testpaths. (#737)
tdomhan Oct 24, 2019
4d3261e
Add a checkpoint callback to the train function. (#741)
hmashlah Oct 30, 2019
619cab3
Rearrange test util methods to make them available in Sockeye library…
fhieber Nov 6, 2019
be7cfe3
Add option to suppress creation of logfiles (#745)
fhieber Nov 7, 2019
e497a2d
Use max_seq_len_* from prepared data when using prepared data
fhieber Nov 10, 2019
6ecf06f
Set ParallelModel threads to daemons, nicer parameter printing, re-en…
fhieber Nov 11, 2019
cc7dd43
[Sockeye 2] Max seconds were not part of args check (#749)
artemsok Nov 12, 2019
58750c7
[Sockeye 2] Prepare data logging fix (#750)
artemsok Nov 12, 2019
9aeaf27
Fix bug with prepare_data args
fhieber Nov 14, 2019
f568cba
Log versions for sockeye-prepare-data (#751)
fhieber Nov 15, 2019
3a716e5
Made mxnet random seeding device-independent. (#756)
fhieber Nov 19, 2019
6fb89f2
remove commented code
fhieber Nov 22, 2019
40fc596
Sockeye 2 training branch merge (#758)
mjdenkowski Nov 26, 2019
cceab94
Fix custom metrics logging to log all metrics with proper names (#759)
fhieber Nov 28, 2019
f4e0c0a
Use mx.context.gpu_memory_info() to retrieve memory usage (#760)
fhieber Nov 28, 2019
f5c7a77
update to sacrebleu 1.4.3 (#761)
fhieber Dec 3, 2019
57cb571
Added more flexibility for source factors combination (#763)
Dec 30, 2019
b938316
Training branch update: (#765)
mjdenkowski Dec 30, 2019
b0461b0
Updates to sockeye 2 (#766)
fhieber Jan 5, 2020
76e5a25
Fix for system tests (#767)
Jan 7, 2020
6ee72b0
Sparse gradient arrays for embeddings (#768)
fhieber Jan 16, 2020
3f61c26
Version bump (#770)
Jan 17, 2020
bdc65d9
Add more papers using Sockeye (#777)
fhieber Jan 30, 2020
4935efb
Sockeye multilingual tutorial (#779)
bricksdont Feb 3, 2020
913b4c2
Variable number of source factors for test generation (#780)
Feb 3, 2020
f2d74fe
Allow setting custom env variables for train & translate clis before …
fhieber Feb 10, 2020
f82f5a7
Minor: update README
fhieber Feb 11, 2020
b5e1a5b
use lru cache to cache vocab_slice_ids take (#784)
fhieber Feb 11, 2020
6dd2741
Update to MXNet 1.6 (#775)
fhieber Feb 25, 2020
7e715a7
Update setup.md (#789)
fhieber Feb 25, 2020
b08eb14
Do not store duplicate, shared parameters (#792)
fhieber Feb 27, 2020
ed503d3
Github action: nightly builds with mxnet (#795)
fhieber Mar 16, 2020
f3bb172
Sockeye 2 validcheck (#794)
tdomhan Mar 16, 2020
6026558
Use nightly build repo link
fhieber Mar 16, 2020
bcc30e4
Option for setting parameters in model (#800)
annacurrey Mar 27, 2020
9092292
Sockeye 2 Inference Optimizations (#798)
blchu Apr 3, 2020
1bf4006
Fix log message about source factors (#802)
fhieber Apr 16, 2020
afbde7a
Dockerfile for CPU-optimized Sockeye image (#803)
mjdenkowski Apr 22, 2020
54f72de
generate_graphs.py incorrect dependency. (#804)
SamuelLarkin Apr 27, 2020
8887712
Remove empty module sockeye_contrib.optimizers (#807)
fhieber Apr 27, 2020
3b23c78
Update papers with Sockeye (#806)
fhieber Apr 27, 2020
34c0960
Revise transformer state caching in beam search to cache transposed s…
fhieber May 13, 2020
e4553d3
[WIP] 8-bit quantization for inference (#771)
kpuatamazon May 20, 2020
50393fc
Sockeye 2 heafield quantize pr2 (#812)
mjdenkowski May 22, 2020
b1b0973
Process the shards using multiple processes in prepare_train_data (#813)
hmashlah May 25, 2020
6320542
Don't cast a model if it's already in that format. (#816)
kpuatamazon May 27, 2020
45d704a
fix Python 3.5 build, no format strings (#817)
fhieber May 27, 2020
d91f57b
Add Sockeye 2 project description paper (#819)
fhieber Jun 2, 2020
e433eae
Merge branch 'master' into sockeye_2_merge
fhieber Jun 3, 2020
16b38c3
Fix manifest
fhieber Jun 3, 2020
ed01ab8
Fix github actions
fhieber Jun 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/push_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@ name: push and pull request testing
on:
push:
branches:
- sockeye_2
- master
pull_request:
branches:
- sockeye_2
- master

jobs:
Expand Down
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,3 @@
.pytest_cache
tags
sockeye/__pycache__
git_version.py

3 changes: 0 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ before_install:
- docker pull ubuntu:16.04

python:
- "3.4"
- "3.5"
- "3.6"

Expand All @@ -26,9 +25,7 @@ script:
- mypy --version
- mypy --ignore-missing-imports --follow-imports=silent @typechecked-files --no-strict-optional
- check-manifest --ignore sockeye/git_version.py
- if [ "$TRAVIS_EVENT_TYPE" != "cron" ]; then python -m pytest -k "Copy:lstm:lstm" --maxfail=1 test/system; fi
- if [ "$TRAVIS_EVENT_TYPE" != "cron" ]; then python -m pytest -k "Copy:transformer:transformer" --maxfail=1 test/system; fi
- if [ "$TRAVIS_EVENT_TYPE" != "cron" ]; then python -m pytest -k "Copy:cnn:cnn" --maxfail=1 test/system; fi
- if [ "$TRAVIS_EVENT_TYPE" = "cron" ]; then python -m pytest --maxfail=1 test/system; fi
- if [ "$TRAVIS_EVENT_TYPE" = "cron" ]; then python -m sockeye_contrib.autopilot.test; fi

Expand Down
128 changes: 93 additions & 35 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Changelog

All notable changes to the project are documented in this file.

Version numbers are of the form `1.0.0`.
Expand All @@ -10,63 +11,120 @@ Note that Sockeye has checks in place to not translate with an old model that wa

Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.

## [1.18.115]
### Added
- Added requirements for MXnet compatible with cuda 10.1.
## [2.1.7]

## [1.18.114]
### Fixed
- Fix bug in prepare_train_data arguments.
### Changed

## [1.18.113]
### Fixed
- Added logging arguments for prepare_data CLI.
- Optimize prepare_data by saving the shards in parallel. The prepare_data script accepts a new parameter `--max-processes` to control the level of parallelism with which shards are written to disk.

## [2.1.6]

### Changed

- Updated Dockerfiles optimized for CPU (intgemm int8 inference, full MKL support) and GPU (distributed training with Horovod). See [sockeye_contrib/docker](sockeye_contrib/docker).

## [1.18.112]
### Added
- Option to suppress creation of logfiles for CLIs (`--no-logfile`).

## [1.18.111]
- Official support for int8 quantization with [intgemm](https://github.com/kpu/intgemm):
- This requires the "intgemm" fork of MXNet ([kpuatamazon/incubator-mxnet/intgemm](https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm)). This is the version of MXNet used in the Sockeye CPU docker image (see [sockeye_contrib/docker](sockeye_contrib/docker)).
- Use `sockeye.translate --dtype int8` to quantize a trained float32 model at runtime.
- Use the `sockeye.quantize` CLI to annotate a float32 model with int8 scaling factors for fast runtime quantization.

## [2.1.5]

### Changed

- Changed state caching for transformer models during beam search to cache states with attention heads already separated out. This avoids repeated transpose operations during decoding, leading to faster inference.

## [2.1.4]

### Added
- Added an optional checkpoint callback for the train function.

- Added Dockerfiles that build an experimental CPU-optimized Sockeye image:
- Uses the latest versions of [kpuatamazon/incubator-mxnet](https://github.com/kpuatamazon/incubator-mxnet) (supports [intgemm](https://github.com/kpu/intgemm) and makes full use of Intel MKL) and [kpuatamazon/sockeye](https://github.com/kpuatamazon/sockeye) (supports int8 quantization for inference).
- See [sockeye_contrib/docker](sockeye_contrib/docker).

## [2.1.3]

### Changed
- Excluded gradients from pickled fields of TrainState

## [1.18.110]
- Performance optimizations to beam search inference
- Remove unneeded take ops on encoder states
- Gathering input data before sending to GPU, rather than sending each batch element individually
- All of beam search can be done in fp16, if specified by the model
- Other small miscellaneous optimizations
- Model states are now a flat list in ensemble inference, structure of states provided by `state_structure()`

## [2.1.2]

### Changed
- We now guard against failures to run `nvidia-smi` for GPU memory monitoring.

## [1.18.109]
### Fixed
- Fixed the metric names by prefixing training metrics with 'train-' and validation metrics with 'val-'. Also restricted the custom logging function to accept only a dictionary and a compulsory global_step parameter.
- Updated to [MXNet 1.6.0](https://github.com/apache/incubator-mxnet/tree/1.6.0)

### Added

- Added support for CUDA 10.2

### Removed

- Removed support for CUDA<9.1 / CUDNN<7.5

## [2.1.1]

### Added
- Ability to set environment variables from training/translate CLIs before MXNet is imported. For example, users can
configure MXNet as such: `--env "OMP_NUM_THREADS=1;MXNET_ENGINE_TYPE=NaiveEngine"`

## [2.1.0]

## [1.18.108]
### Changed
- More verbose log messages about target token counts.

## [1.18.107]
- Version bump, which should have been included in commit b0461b due to incompatible models.

## [2.0.1]

### Changed
- Updated to [MXNet 1.5.0](https://github.com/apache/incubator-mxnet/tree/1.5.0)

## [1.18.106]
### Added
- Added an optional time limit for stopping training. The training will stop at the next checkpoint after reaching the time limit.
- Inference defaults to using the max input length observed in training (versus scaling down based on mean length ratio and standard deviations).

## [1.18.105]
### Added
- Added support for a possibility to have a custom metrics logger - a function passed as an extra parameter. If supplied, the logger is called during training.

## [1.18.104]
- Additional parameter fixing strategies:
- `all_except_feed_forward`: Only train feed forward layers.
- `encoder_and_source_embeddings`: Only train the decoder (decoder layers, output layer, and target embeddings).
- `encoder_half_and_source_embeddings`: Train the latter half of encoder layers and the decoder.
- Option to specify the number of CPU threads without using an environment variable (`--omp-num-threads`).
- More flexibility for source factors combination

## [2.0.0]

### Changed
- Implemented an attention-based copy mechanism as described in [Jia, Robin, and Percy Liang. "Data recombination for neural semantic parsing." (2016)](https://arxiv.org/abs/1606.03622).
- Added a <ptr\d+> special symbol to explicitly point at an input token in the target sequence
- Changed the decoder interface to pass both the decoder data and the pointer data.
- Changed the AttentionState named tuple to add the raw attention scores.

- Update to [MXNet 1.5.0](https://github.com/apache/incubator-mxnet/tree/1.5.0)
- Moved `SockeyeModel` implementation and all layers to [Gluon API](http://mxnet.incubator.apache.org/versions/master/gluon/index.html)
- Removed support for Python 3.4.
- Removed image captioning module
- Removed outdated Autopilot module
- Removed unused training options: Eve, Nadam, RMSProp, Nag, Adagrad, and Adadelta optimizers, `fixed-step` and `fixed-rate-inv-t` learning rate schedulers
- Updated and renamed learning rate scheduler `fixed-rate-inv-sqrt-t` -> `inv-sqrt-decay`
- Added script for plotting metrics files: [sockeye_contrib/plot_metrics.py](sockeye_contrib/plot_metrics.py)
- Removed option `--weight-tying`. Weight tying is enabled by default, disable with `--weight-tying-type none`.

### Added

- Added distributed training support with Horovod/OpenMPI. Use `horovodrun` and the `--horovod` training flag.
- Added Dockerfiles that build a Sockeye image with all features enabled. See [sockeye_contrib/docker](sockeye_contrib/docker).
- Added `none` learning rate scheduler (use a fixed rate throughout training)
- Added `linear-decay` learning rate scheduler
- Added training option `--learning-rate-t-scale` for time-based decay schedulers
- Added support for MXNet's [Automatic Mixed Precision](https://mxnet.incubator.apache.org/versions/master/tutorials/amp/amp_tutorial.html). Activate with the `--amp` training flag. For best results, make sure as many model dimensions are possible are multiples of 8.
- Added options for making various model dimensions multiples of a given value. For example, use `--pad-vocab-to-multiple-of 8`, `--bucket-width 8 --no-bucket-scaling`, and `--round-batch-sizes-to-multiple-of 8` with AMP training.
- Added [GluonNLP](http://gluon-nlp.mxnet.io/)'s BERTAdam optimizer, an implementation of the Adam variant used by Devlin et al. ([2018](https://arxiv.org/pdf/1810.04805.pdf)). Use `--optimizer bertadam`.
- Added training option `--checkpoint-improvement-threshold` to set the amount of metric improvement required over the window of previous checkpoints to be considered actual model improvement (used with `--max-num-checkpoint-not-improved`).

## [1.18.103]
### Added
- Added ability to score image-sentence pairs by extending the scoring feature originally implemented for machine
- Added ability to score image-sentence pairs by extending the scoring feature originally implemented for machine
translation to the image captioning module.

## [1.18.102]
Expand Down Expand Up @@ -95,7 +153,7 @@ Each version section may have have subsections for: _Added_, _Changed_, _Removed

## [1.18.96]
### Changed
- Extracted prepare vocab functionality in the build vocab step into its own function. This matches the pattern in prepare data and train where the main() function only has argparsing, and it invokes a separate function to do the work. This is to allow modules that import this one to circumvent the command line.
- Extracted prepare vocab functionality in the build vocab step into its own function. This matches the pattern in prepare data and train where the main() function only has argparsing, and it invokes a separate function to do the work. This is to allow modules that import this one to circumvent the command line.

## [1.18.95]
### Changed
Expand Down
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ include .flake8
include typechecked-files
include test/data/config_with_missing_attributes.yaml
include sockeye/git_version.py
include *.bib
recursive-include .github *
include CONTRIBUTING.md
exclude *.sh
Expand All @@ -21,8 +22,8 @@ recursive-include docs *.html
recursive-include docs *.png
recursive-include docs *.md
recursive-include docs *.py
recursive-include docs *.sh
recursive-include docs *.yml
recursive-include docs *.ico
recursive-include docs *.css
recursive-include test *.txt
include docs/tutorials/multilingual/prepare-iwslt17-multilingual.sh
80 changes: 69 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,87 @@
[![Build Status](https://travis-ci.org/awslabs/sockeye.svg?branch=master)](https://travis-ci.org/awslabs/sockeye)
[![Documentation Status](https://readthedocs.org/projects/sockeye/badge/?version=latest)](http://sockeye.readthedocs.io/en/latest/?badge=latest)

This package contains the Sockeye project, a sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet (Incubating).
It implements state-of-the-art encoder-decoder architectures, such as:
This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on [Apache MXNet (Incubating)](http://mxnet.incubator.apache.org/). Sockeye powers several Machine Translation use cases, including [Amazon Translate](https://aws.amazon.com/translate/). The framework implements state-of-the-art machine translation models with Transformers ([Vaswani et al, 2017](https://arxiv.org/abs/1706.03762)). Recent developments and changes are tracked in our [CHANGELOG](https://github.com/awslabs/sockeye/blob/master/CHANGELOG.md).

- Deep Recurrent Neural Networks with Attention [[Bahdanau, '14](https://arxiv.org/abs/1409.0473)]
- Transformer Models with self-attention [[Vaswani et al, '17](https://arxiv.org/abs/1706.03762)]
- Fully convolutional sequence-to-sequence models [[Gehring et al, '17](https://arxiv.org/abs/1705.03122)]
If you have any questions or discover problems, please [file an issue](https://github.com/awslabs/sockeye/issues/new). You can also send questions to *sockeye-dev-at-amazon-dot-com*.

In addition, it provides an experimental [image-to-description module](https://github.com/awslabs/sockeye/tree/master/sockeye/image_captioning) that can be used for image captioning.
Recent developments and changes are tracked in our [CHANGELOG](https://github.com/awslabs/sockeye/blob/master/CHANGELOG.md).
#### Version 2.0

If you have any questions or discover problems, please [file an issue](https://github.com/awslabs/sockeye/issues/new).
You can also send questions to *sockeye-dev-at-amazon-dot-com*.
With version 2.0, we have updated the usage of MXNet by moving to the [Gluon API](https://mxnet.incubator.apache.org/api/python/docs/api/gluon/index.html) and adding support for several state-of-the-art features such as distributed training, low-precision training and decoding, as well as easier debugging of neural network architectures.
In the context of this rewrite, we also trimmed down the large feature set of version 1.18.x to concentrate on the most important types of models and features, to provide a maintainable framework that is suitable for fast prototyping, research, and production.
We welcome Pull Requests if you would like to help with adding back features when needed.

## Installation

The easiest way to run Sockeye is with [Docker](https://www.docker.com) or [nvidia-docker](https://github.com/NVIDIA/nvidia-docker).
To build a Sockeye image with all features enabled, run the build script:

```bash
python3 sockeye_contrib/docker/build.py
```

See the [Dockerfile documentation](sockeye_contrib/docker) for more information.

## Documentation

For information on how to use Sockeye, please visit [our documentation](https://awslabs.github.io/sockeye/).
Developers may be interested in our [developer guidelines](https://awslabs.github.io/sockeye/development.html).

- For a quickstart guide to training a large data WMT model, see the [WMT 2018 German-English tutorial](https://awslabs.github.io/sockeye/tutorials/wmt_large.html).
- Developers may be interested in our [developer guidelines](https://awslabs.github.io/sockeye/development.html).

## Citation

For technical information about Sockeye, see our paper on the arXiv ([BibTeX](sockeye.bib)):
For more information about Sockeye 2, see our paper ([BibTeX](sockeye2.bib)):

> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar. 2020.
> [Sockeye 2: A Toolkit for Neural Machine Translation](https://www.amazon.science/publications/sockeye-2-a-toolkit-for-neural-machine-translation). To appear in EAMT 2020, project track.

For technical information about Sockeye 1, see our paper on the arXiv ([BibTeX](sockeye.bib)):

> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post. 2017.
> [Sockeye: A Toolkit for Neural Machine Translation](https://arxiv.org/abs/1712.05690). ArXiv e-prints.

## Research with Sockeye

Sockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below.
If you know more, please let us know or submit a pull request (last updated: April 2020).

### 2020

* Dinu, Georgiana, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan. "Joint translation and unit conversion for end-to-end localization." arXiv preprint arXiv:2004.05219 (2020)
* Hisamoto, Sorami, Matt Post, Kevin Duh. "Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?" Transactions of the Association for Computational Linguistics, Volume 8 (2020)
* Naradowsky, Jason, Xuan Zhan, Kevin Duh. "Machine Translation System Selection from Bandit Feedback." arXiv preprint arXiv:2002.09646 (2020)
* Niu, Xing, Marine Carpuat. "Controlling Neural Machine Translation Formality with Synthetic Supervision." Proceedings of AAAI (2020)

### 2019

* Agrawal, Sweta, Marine Carpuat. "Controlling Text Complexity in Neural Machine Translation." Proceedings of EMNLP (2019)
* Beck, Daniel, Trevor Cohn, Gholamreza Haffari. "Neural Speech Translation using Lattice Transformations and Graph Networks." Proceedings of TextGraphs-13 (EMNLP 2019)
* Currey, Anna, Kenneth Heafield. "Zero-Resource Neural Machine Translation with Monolingual Pivot Data." Proceedings of EMNLP (2019)
* Gupta, Prabhakar, Mayank Sharma. "Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles." IEEE International Journal of Semantic Computing (2019)
* Hu, J. Edward, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting." Proceedings of NAACL-HLT (2019)
* Rosendahl, Jan, Christian Herold, Yunsu Kim, Miguel Graça,Weiyue Wang, Parnia Bahar, Yingbo Gao and Hermann Ney “The RWTH Aachen University Machine Translation Systems for WMT 2019” Proceedings of the 4th WMT: Research Papers (2019)
* Thompson, Brian, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. "Overcoming catastrophic forgetting during domain adaptation of neural machine translation." Proceedings of NAACL-HLT 2019 (2019)
* Tättar, Andre, Elizaveta Korotkova, Mark Fishel “University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission” Proceedings of 4th WMT: Research Papers (2019)

### 2018

* Domhan, Tobias. "How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures". Proceedings of 56th ACL (2018)
* Kim, Yunsu, Yingbo Gao, and Hermann Ney. "Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies." arXiv preprint arXiv:1905.05475 (2019)
* Korotkova, Elizaveta, Maksym Del, and Mark Fishel. "Monolingual and Cross-lingual Zero-shot Style Transfer." arXiv preprint arXiv:1808.00179 (2018)
* Niu, Xing, Michael Denkowski, and Marine Carpuat. "Bi-directional neural machine translation with synthetic parallel data." arXiv preprint arXiv:1805.11213 (2018)
* Niu, Xing, Sudha Rao, and Marine Carpuat. "Multi-Task Neural Models for Translating Between Styles Within and Across Languages." COLING (2018)
* Post, Matt and David Vilar. "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation." Proceedings of NAACL-HLT (2018)
* Schamper, Julian, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. "The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018." Proceedings of the 3rd WMT: Shared Task Papers (2018)
* Schulz, Philip, Wilker Aziz, and Trevor Cohn. "A stochastic decoder for neural machine translation." arXiv preprint arXiv:1805.10844 (2018)
* Tamer, Alkouli, Gabriel Bretschner, and Hermann Ney. "On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation." Proceedings of the 3rd WMT: Research Papers (2018)
* Tang, Gongbo, Rico Sennrich, and Joakim Nivre. "An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation." Proceedings of 3rd WMT: Research Papers (2018)
* Thompson, Brian, Huda Khayrallah, Antonios Anastasopoulos, Arya McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, and Philipp Koehn. "Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation." arXiv preprint arXiv:1809.05218 (2018)
* Vilar, David. "Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models." Proceedings of NAACL-HLT (2018)
* Vyas, Yogarshi, Xing Niu and Marine Carpuat “Identifying Semantic Divergences in Parallel Text without Annotations”. Proceedings of NAACL-HLT (2018)
* Wang, Weiyue, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. "Neural Hidden Markov Model for Machine Translation". Proceedings of 56th ACL (2018)
* Zhang, Xuan, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. "An Empirical Exploration of Curriculum Learning for Neural Machine Translation." arXiv preprint arXiv:1811.00739 (2018)

### 2017

* Domhan, Tobias and Felix Hieber. "Using target-side monolingual data for neural machine translation through multi-task learning." Proceedings of EMNLP (2017).
Loading