Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

h2o for kv cache compression #1468

Merged
merged 90 commits into from
Jul 29, 2024
Merged

h2o for kv cache compression #1468

merged 90 commits into from
Jul 29, 2024

Conversation

n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Apr 10, 2024

Type of Change

feature

Description

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
paper

NTD

  • example
  • refactor code to same style
  • add seq len api
  • support for more models, both sim and real
  • mean accumulate score function

Expected Behavior & Potential Risk

None

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Copy link

github-actions bot commented Apr 10, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

BiaoFangAIA and others added 3 commits April 23, 2024 16:36
Signed-off-by: biao.fang <[email protected]>
Signed-off-by: biao.fang <[email protected]>
n1ck-guo and others added 4 commits April 25, 2024 03:01
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
@VincyZhang VincyZhang added the WIP label May 13, 2024
@PenghuiCheng
Copy link
Contributor

Could we add a document introducing what h2o is?

@changwangss
Copy link
Contributor

changwangss commented Jul 2, 2024

format scan improved by #1647. merged.

@n1ck-guo
Copy link
Contributor Author

Could we add a document introducing what h2o is?

add in the example/readme

@n1ck-guo n1ck-guo removed the request for review from VincyZhang July 16, 2024 01:29
Signed-off-by: n1ck-guo <[email protected]>
@n1ck-guo n1ck-guo requested a review from VincyZhang as a code owner July 16, 2024 04:49
@n1ck-guo n1ck-guo removed the request for review from yintong-lu July 29, 2024 05:52
@XuehaoSun XuehaoSun merged commit 3e78ae8 into main Jul 29, 2024
17 checks passed
@XuehaoSun XuehaoSun deleted the hengguo/h2o branch July 29, 2024 05:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants