-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update master to Sockeye 2 #822
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nding inference code. Removed weight normalization from OutputLayer (not used)
…lated bug in decoder
…th Mixed as its not needed. Comment out tutorial args tests -> tutorials need updating to transformer models
* Update to MXNET 1.6.0 * Add CUDA 10.2 * changelog
* Fail on empty target validation sentences.
* Option for setting parameters in model * Unit tests and flags for set_parameters Co-authored-by: Currey <[email protected]>
* fp16 with fp32 accumulation on log_softmax * Hybrid beam search take, removing encoder takes * Bulk prepare inference input in CPU before sending all to GPU * Beam search decoding set to model dtype instead of fp32 * Replaced split-concat with slicing, added and modified comments, and some renaming * Fixed test failures and errors * Model state structure and resolved cherry-picking artifacts * Corrected comments to match correct variables and shapes * Flat state list, nesting determined by state structure * Type declarations for ensemble decoding states * Updated changelog and version * Convert accumulated scores back to fp32 before argsort
* Pad vocab to a multiple of 8 for quantization * Single codebase using decoding float32 and int8 transformer, except embeddings * No need for a space change in inference * Remove logging code * Undo changes to train.py defaults * Allow casting to non-int8 types * Move dtype to model * Default to FullyConnected * Remove unnecessary imports * Comment weight initializer zeros * Warning on cast * Copyright on quantization.py, spacing fix * Tuples as (1,) * TransformerConfig doesn't have dtype anymore * More dtype passing * Output layer quantization * Fix missing import/logger * CPU-independent disk format Works with this quantization program (TODO integrate): import mxnet as mx model = mx.nd.load("/home/ubuntu/idid-enus/model.amt.sf-concat/params.best") dense = [k[0:-7] for k in model.keys() if k.endswith('.weight') and not k.startswith("embedding_source.")] dense.remove("encoder.pos_embedding") dense.remove("decoder.pos_embedding") for param in dense: name = param + ".weight" b = model[name] b_max = mx.nd.contrib.intgemm_maxabsolute(b) # The disk format just quantizes. b_prepared = mx.nd.contrib.intgemm_prepare_data(b, b_max) model[name] = b_prepared model[param + ".scaling"] = b_max / 127.0 mx.nd.save("/home/ubuntu/idid-enus/model.amt.sf-concat.quant/params.best", model) * Update comment * Version that loads a float32 model and quantizes on the fly But it doesn't check all parameters are in the provided model * Disk saving option * Wrap comment to 80 characters * C.DTYPE_INT8 and space after # * No spacing around keyword arguments * Typing on convert_weights_disk_format Co-Authored-By: Felix Hieber <[email protected]> * Typing on convert_weights_cpu_dependent Co-Authored-By: Felix Hieber <[email protected]> * Make calls friendly to custom operators * Hacky way to find custom operator * Configurable to custom operator * fheiber's patch to dtypes * C.DTYPE_FP32 and remove errant , * Quantization: minimize mean squared error for parameters * Use cached quantization scaling * Quantization: do on-the-fly directly * Hackily restore model type to saving type * Quantization: store scaling * Fix use of existing scaling factors Co-authored-by: Felix Hieber <[email protected]>
* Quantize CLI, Docker build update, version/changelog update.
tdomhan
approved these changes
Jun 3, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merges Sockeye 2 (
sockeye_2
branch) intomaster
.Commits should not be squashed
Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.