Support pre-packed weights #483

robertknight · 2024-12-24T20:46:00Z

This reduces inference time for matmul operations at the cost of higher memory usage. Prepacking is initially disabled by default as there are common situations (single inference run, large transformer decoder models) where it isn't worthwhile. It is enabled via ModelOptions::prepack_weights:

let mut model_options = ModelOptions::with_all_ops();
model_options.prepack_weights(true);
let model = model_options.load_file(path)

To facilitate a quick comparison of the performance impact of prepacking, the rten CLI now has a --prepacked option to enable prepacking.

There are some caveats:

Non-MatMul operations which use matmuls internally (Conv, ConvTranspose, LSTM, GRU etc.) currently don't prepack their weights.
MatMul operations which turn out to be matrix-vector (gemv) products don't use the prepacked weights. This affects transformer decoders doing non-batched generation after the initial prompt encoding step.

Fixes #214.

TODO:

Tests
Make it configurable whether prepacking is used
Support subgraphs

This reduces inference time for matmul operations at the cost of higher memory usage. - Add methods to `Operator` trait to declare which inputs can potentially be pre-packed and to prepack those inputs. - Add `Graph::prepack_weights` method to traverse operators and prepack inputs whose values are constant nodes. - Implement prepacking methods for MatMul and fused MatMul ops - Add APIs to `ModelOptions` to enable prepacking. There are some caveats: - Non-MatMul operations which use matmuls internally (Conv, ConvTranspose, LSTM, GRU etc.) currently don't prepack their weights. - MatMul operations which turn out to be matrix-vector (gemv) products don't use the prepacked weights. This affects transformer decoders doing non-batched generation after the initial prompt encoding step.

robertknight force-pushed the prepack-weights branch 4 times, most recently from 0a99fdd to 2b245d4 Compare December 25, 2024 08:16

robertknight added 2 commits December 25, 2024 08:29

Add CLI option to enable prepacking of model weights

dd38a4a

robertknight force-pushed the prepack-weights branch from 2b245d4 to dd38a4a Compare December 25, 2024 08:30

robertknight mentioned this pull request Dec 25, 2024

Support prepacked weights for non-MatMul operators #484

Open

5 tasks

robertknight marked this pull request as ready for review December 25, 2024 08:36

robertknight merged commit fffb990 into main Dec 25, 2024
2 checks passed

robertknight deleted the prepack-weights branch December 25, 2024 08:36

robertknight changed the title ~~Support pre-packing weights after model optimization~~ Support pre-packed weights Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pre-packed weights #483

Support pre-packed weights #483

robertknight commented Dec 24, 2024 •

edited

Loading

Support pre-packed weights #483

Support pre-packed weights #483

Conversation

robertknight commented Dec 24, 2024 • edited Loading

robertknight commented Dec 24, 2024 •

edited

Loading