Skip to content

Latest commit

 

History

History
68 lines (29 loc) · 1.27 KB

paper_reading.md

File metadata and controls

68 lines (29 loc) · 1.27 KB

Paper to Read:

Megatron LM 论文精读【论文精读】

https://www.bilibili.com/video/BV1nB4y1R7Yz

flashattention:

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135

参数服务器(Parameter Server)逐段精读【论文精读】

https://www.bilibili.com/video/BV1YA4y197G8

GPipe 论文精读

https://www.bilibili.com/video/BV1v34y1E7zu

Pathways 论文精读

https://www.bilibili.com/video/BV1xB4y1m7Xi

Transformer推理加速论文

vllm:

https://github.com/vllm-project/vllm

FasterTransformer:

https://github.com/NVIDIA/FasterTransformer

Transformer加速算法

Transformer加速包括量化,剪枝,模型蒸馏

投机推理方向的一些论文:

Speculative Decoding:Google ICML'23

Speculative Sampling:DeepMind arXiv preprint arXiv:2302.01318 2023

SpecInfer:CMU arXiv preprint arXiv:2305.09781 2023

Medusa:Princeton

LLM Accelerator:Microsoft arXiv preprint arXiv:2304.04487 2023

REST: Retrieval-Based Speculative Decoding:Peking, Princeton arXiv preprin arXiv:2311.08252 2023

Prompt lookup decoding (PLD):https://github.com/apoorvumang/prompt-lookup-decoding

Lookahead Decoding:https://lmsys.org/blog/2023-11-21-lookahead-decoding/

一些其他的应用: