Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pre-packed weights #483

Merged
merged 2 commits into from
Dec 25, 2024
Merged

Support pre-packed weights #483

merged 2 commits into from
Dec 25, 2024

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Dec 24, 2024

This reduces inference time for matmul operations at the cost of higher memory usage. Prepacking is initially disabled by default as there are common situations (single inference run, large transformer decoder models) where it isn't worthwhile. It is enabled via ModelOptions::prepack_weights:

let mut model_options = ModelOptions::with_all_ops();
model_options.prepack_weights(true);
let model = model_options.load_file(path)

To facilitate a quick comparison of the performance impact of prepacking, the rten CLI now has a --prepacked option to enable prepacking.

There are some caveats:

  • Non-MatMul operations which use matmuls internally (Conv, ConvTranspose, LSTM, GRU etc.) currently don't prepack their weights.
  • MatMul operations which turn out to be matrix-vector (gemv) products don't use the prepacked weights. This affects transformer decoders doing non-batched generation after the initial prompt encoding step.

Fixes #214.

TODO:

  • Tests
  • Make it configurable whether prepacking is used
  • Support subgraphs

@robertknight robertknight force-pushed the prepack-weights branch 4 times, most recently from 0a99fdd to 2b245d4 Compare December 25, 2024 08:16
This reduces inference time for matmul operations at the cost of higher memory
usage.

 - Add methods to `Operator` trait to declare which inputs can
   potentially be pre-packed and to prepack those inputs.

 - Add `Graph::prepack_weights` method to traverse operators and prepack
   inputs whose values are constant nodes.

 - Implement prepacking methods for MatMul and fused MatMul ops

 - Add APIs to `ModelOptions` to enable prepacking.

There are some caveats:

 - Non-MatMul operations which use matmuls internally (Conv, ConvTranspose,
   LSTM, GRU etc.) currently don't prepack their weights.

 - MatMul operations which turn out to be matrix-vector (gemv) products don't
   use the prepacked weights. This affects transformer decoders doing
   non-batched generation after the initial prompt encoding step.
@robertknight robertknight marked this pull request as ready for review December 25, 2024 08:36
@robertknight robertknight merged commit fffb990 into main Dec 25, 2024
2 checks passed
@robertknight robertknight deleted the prepack-weights branch December 25, 2024 08:36
@robertknight robertknight changed the title Support pre-packing weights after model optimization Support pre-packed weights Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prepack weights when model is loaded
1 participant