Sparsity #1

robertgshaw2-redhat · 2024-01-27T20:15:12Z

Refactored @alexm-nm's work in the old private repo.

In addition to Alex's original work, I avoid materializing the dense matrices on CPU during weight loading (previous iteration materialized the entire model). Now, we create the dense matrices for loading on the fly.

Improvements:

Currently, when we load the QKV matrix, we unpack and repack 3 times. We could make loading happen faster by unpacking during the first shard load and only repacking once all shards have been loaded
~~CUDAGraphs are not working~~ (fixed by https://github.com/neuralmagic/nm_gpu/pull/15)

alexm-redhat

LGTM! Thanks for the port

vllm/config.py

vllm/engine/arg_utils.py

vllm/model_executor/layers/linear.py

newline

T5 enc/dec example file; linting/formatting

robertgshaw2-redhat and others added 5 commits January 27, 2024 15:40

inital commit

32491c6

end to end appears to be working

8db3a54

updated README.md

16dd160

readded moved files

f7e2bf5

Update __init__.py

b3c32f3

robertgshaw2-redhat requested a review from alexm-redhat January 27, 2024 20:19

alexm-redhat approved these changes Jan 27, 2024

View reviewed changes

vllm/config.py Show resolved Hide resolved

vllm/config.py Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/linear.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/linear.py Show resolved Hide resolved

robertgshaw2-redhat added 2 commits January 27, 2024 15:42

Update linear.py

7d59c40

Update arg_utils.py

cd9c32d

robertgshaw2-redhat requested review from alexm-redhat and LucasWilkinson January 27, 2024 20:45

alexm-redhat approved these changes Jan 27, 2024

View reviewed changes

robertgshaw2-redhat and others added 5 commits January 28, 2024 22:44

missed files

669ab5b

Update __init__.py

611cb23

newline

Update base_config.py

f9fe781

Update sparse_w16a16.py

aa434c2

newline

Sync with latest changes from magic_wand

87e7516

alexm-redhat merged commit 0b0a588 into main Feb 1, 2024

afeldman-nm mentioned this pull request Feb 9, 2024

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

Merged

LucasWilkinson mentioned this pull request Mar 17, 2024

[Misc]: Move from using PYBIND11_MODULE macro to bind C++/CUDA kernels to python to using TORCH_LIBRARY macro #133

Closed

afeldman-nm pushed a commit that referenced this pull request Mar 25, 2024

Merge pull request #1 from afeldman-nm/enc_dec_t5

db726e6

T5 enc/dec example file; linting/formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparsity #1

Sparsity #1

robertgshaw2-redhat commented Jan 27, 2024 •

edited

Loading

alexm-redhat left a comment

Sparsity #1

Sparsity #1

Conversation

robertgshaw2-redhat commented Jan 27, 2024 • edited Loading

alexm-redhat left a comment

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jan 27, 2024 •

edited

Loading