Skip to content

Commit

Permalink
Update PUBLICATIONS.md
Browse files Browse the repository at this point in the history
add odyssey llm paper from metuan
  • Loading branch information
hwu36 authored Jan 18, 2024
1 parent 139b93d commit b4b5b11
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions PUBLICATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

- ["A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library"](https://arxiv.org/abs/2312.11918). Ganesh Bikshandi and Jay Shah. _arXiv_, December 2023.

- ["A Speed Odyssey for Deployable Quantization of LLMs"](https://arxiv.org/abs/2311.09550). Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Liang Li, Yifan Lu, Xiangxiang Chu, Yerui Sun, Yuchen Xie. _arXiv_, November 2023.

- ["FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"](https://arxiv.org/abs/2307.08691). Tri Dao. _Technical Report_, July 2023.

- ["MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"](https://arxiv.org/abs/2211.15841). Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia. _Proceedings of the Sixth Machine Learning and Systems_, May 2023.
Expand Down

0 comments on commit b4b5b11

Please sign in to comment.