Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update master to Sockeye 2 #822

Merged
merged 141 commits into from
Jun 3, 2020
Merged

Update master to Sockeye 2 #822

merged 141 commits into from
Jun 3, 2020

Conversation

fhieber
Copy link
Contributor

@fhieber fhieber commented Jun 3, 2020

Merges Sockeye 2 (sockeye_2 branch) into master.

Commits should not be squashed

Pull Request Checklist

  • Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • You have considered writing a test
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fhieber added 30 commits June 7, 2019 15:10
…nding inference code. Removed weight normalization from OutputLayer (not used)
…th Mixed as its not needed. Comment out tutorial args tests -> tutorials need updating to transformer models
fhieber and others added 21 commits February 25, 2020 10:30
* Update to MXNET 1.6.0

* Add CUDA 10.2

* changelog
* Fail on empty target validation sentences.
* Option for setting parameters in model

* Unit tests and flags for set_parameters

Co-authored-by: Currey <[email protected]>
* fp16 with fp32 accumulation on log_softmax

* Hybrid beam search take, removing encoder takes

* Bulk prepare inference input in CPU before sending all to GPU

* Beam search decoding set to model dtype instead of fp32

* Replaced split-concat with slicing, added and modified comments, and some renaming

* Fixed test failures and errors

* Model state structure and resolved cherry-picking artifacts

* Corrected comments to match correct variables and shapes

* Flat state list, nesting determined by state structure

* Type declarations for ensemble decoding states

* Updated changelog and version

* Convert accumulated scores back to fp32 before argsort
* Pad vocab to a multiple of 8 for quantization

* Single codebase using decoding float32 and int8 transformer, except embeddings

* No need for a space change in inference

* Remove logging code

* Undo changes to train.py defaults

* Allow casting to non-int8 types

* Move dtype to model

* Default to FullyConnected

* Remove unnecessary imports

* Comment weight initializer zeros

* Warning on cast

* Copyright on quantization.py, spacing fix

* Tuples as (1,)

* TransformerConfig doesn't have dtype anymore

* More dtype passing

* Output layer quantization

* Fix missing import/logger

* CPU-independent disk format

Works with this quantization program (TODO integrate):
import mxnet as mx
model = mx.nd.load("/home/ubuntu/idid-enus/model.amt.sf-concat/params.best")
dense = [k[0:-7] for k in model.keys() if k.endswith('.weight') and not k.startswith("embedding_source.")]
dense.remove("encoder.pos_embedding")
dense.remove("decoder.pos_embedding")
for param in dense:
  name = param + ".weight"
  b = model[name]
  b_max = mx.nd.contrib.intgemm_maxabsolute(b)
  # The disk format just quantizes.
  b_prepared = mx.nd.contrib.intgemm_prepare_data(b, b_max)
  model[name] = b_prepared
  model[param + ".scaling"] = b_max / 127.0
mx.nd.save("/home/ubuntu/idid-enus/model.amt.sf-concat.quant/params.best", model)

* Update comment

* Version that loads a float32 model and quantizes on the fly
But it doesn't check all parameters are in the provided model

* Disk saving option

* Wrap comment to 80 characters

* C.DTYPE_INT8 and space after #

* No spacing around keyword arguments

* Typing on convert_weights_disk_format

Co-Authored-By: Felix Hieber <[email protected]>

* Typing on convert_weights_cpu_dependent

Co-Authored-By: Felix Hieber <[email protected]>

* Make calls friendly to custom operators

* Hacky way to find custom operator

* Configurable to custom operator

* fheiber's patch to dtypes

* C.DTYPE_FP32 and remove errant ,

* Quantization: minimize mean squared error for parameters

* Use cached quantization scaling

* Quantization: do on-the-fly directly

* Hackily restore model type to saving type

* Quantization: store scaling

* Fix use of existing scaling factors

Co-authored-by: Felix Hieber <[email protected]>
* Quantize CLI, Docker build update, version/changelog update.
@fhieber fhieber merged commit 88dc440 into master Jun 3, 2020
@fhieber fhieber deleted the sockeye_2_merge_again branch June 3, 2020 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.