LLM-paper-list

There are many papers and blogs related to LLM emerge every week. So I created this list to collect papers that I'm interested in. This list was created on Oct. 23, 2023, so some important papers before this date might be missed.

Models

Llemma: An Open Language Model For Mathematics paper, code
- #mathematics #codellama
Magicoder: Source Code Is All You Need (a fully open sourced coding model) paper, code

Transformer design

Simplifying Transformer Blocks paper
Alternating Updates for Efficient Transformers (Google Research) (NeutIPS'23) paper

Multimodal LLM

CogVLM: Visual Expert for Pretrained Language Models paper, code
Emu Edit: Precise Image Editing via Recognition and Generation Tasks (from Meta) (use text instructions to modify images) paper, blog
LLaVA: Large Language and Vision Assistant (from Microsoft) (NeurIPS'23) Main Page, code
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities (from Google DeepMind) (text, video, audio) paper

Effictive training

FP8-LM: Training FP8 Large Language Models paper, code
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer paper, code
MoLORA (Mixture of LORA) (from cohere) paper, code
Gated Linear Attention Transformers with Hardware-Efficient Training paper
- Performance still worse than transformer-based model

Hyperparameter tuning

Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments (from lightning AI) blog

Parameter-Efficient fine-tuning

LoRA: Low-Rank Adaptation of Large Language Models paper
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment paper
- Dealing with world knowledge forgetting during SFT

Mixture of Experts

A curated reading list of research in Adaptive Computation (AC) & Mixture of Experts (MoE) repo
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (“Here's the paper you need to read understand today” - Sasha Rush) paper
Mistral MoE base model blog
Calculate an MoE model by hand post

After tuning

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch paper, code
- Merge SFT models into base LLM with special method can improve the performance.

Efficient inference

s-lora (batch lora weight inferencing) code
blogs:
- LLM系列笔记：LLM Inference量化分析与加速
- How to make LLMs go fast blog
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU paper, code
- significantly reduces GPU memory demands and CPU-GPU data transfer
LLM in a flash: Efficient Large Language Model Inference with Limited Memory (From Apple) paper

Making decoding process faster:

Lookahead Decoding blog
PaSS: Parallel Speculative Sampling (from Apple)(NeurIPS'23) paper

In-context learning

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (from DeepMind) paper

Long-sequence

Efficient Streaming Language Models with Attention Sinks paper, open source implementation
YaRN: Efficient Context Window Extension of Large Language Models paper, YaRN on Mistral-7b-128k
RoPE scaling post, Hugging Face implementation
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey paper, repo
The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey paper

Dynamic Adaptive Prompt Engineering

Chain of Code:Reasoning with a Language Model-Augmented Code Emulator (from DeepMind and Fei Fei Li) project page, paper
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding (from Google Research) paper

Knowledge Fusion

Knowledge Fusion of Large Language Models (ICLR'24) paper

Evaluation

PromptBench - a unified library that supports comprehensive evaluation and analysis of LLMs code

RAG

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection paper, main page
- learns to retrieve, generate and critique to enhance LM's output quality and factuality, outperforming ChatGPT and retrieval-augmented LLama2 Chat on six tasks.
Understanding Retrieval Augmentation for Long-Form Question Answering paper
- evidence documents should be carefully added to the LLM
- the order of information presented in evidence documents will impact the order of information presented in the generated answer
Learning to Filter Context for Retrieval-Augmented Generation paper, code
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models paper
Retrieval-Augmented Generation for Large Language Models: A Survey paper

Cost saving

AutoMix: Automatically Mixing Language Models paper
- route queries to LLMs based on the correctness of smaller language models

Alignment

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF paper, model
- aligning LLMs without using RLHF
Fine-tuning Language Models for Factuality paper

Datasets

Proof-Pile-2 dataset: the dataset involves scientific paper, web data containing mathematics, and mathematical code. link
- #llemma
- AlgebraicStack includes many coding language data. There are 954.1 M tokens for C++ code.
RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset code
Open Platypus paper, data, data size: 24,926
Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math paper, code, HF dataset page
- tokenizer: GPTNeoX-20B

Benchmarks

GPQA: A Graduate-Level Google-Proof Q&A Benchmark (very hard questions) (from Cohere, Anthropic, NYU) paper, data and code, data size: 448
GAIA: a benchmark for General AI Assistants (from Meta, Yann LeCun) paper, page

Domain adaptation

LLMs for Chip Design paper (from NVIDIA)

Adversarial Attacks

Adversarial Attacks on GPT-4 via Simple Random Search paper

Review paper

Large Language Models for Software Engineering: Survey and Open Problems paper
- LLM for code generation
- LLM for software testing, debugging, repair
- LLM for documentation generation
Software testing with large language model: Survey, landscape, and vision paper
A Survey on Language Models for Code paper, code
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks paper (NeurIPS'23)
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 paper
Data Management For Large Language Models: A Survey (for pretraining and SFT) paper, code
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents paper
- A comprehensive overview of the benefits of training LLMs with code-specific data.

Not paper, but good discussion

Your settings are (probably) hurting your model - Why sampler settings matter

Good 101 materials

DPO v.s RLHF (from Latent Space podcast) RLHF 201 - with Nathan Lambert of AI2 and Interconnects
LLM course (got 10k stars) repo

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.gitignore		.gitignore
CoC.md		CoC.md
README.md		README.md
extend_tokenizer.md		extend_tokenizer.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-paper-list

Models

Transformer design

Multimodal LLM

Effictive training

Hyperparameter tuning

Parameter-Efficient fine-tuning

Mixture of Experts

After tuning

Efficient inference

Making decoding process faster:

In-context learning

Long-sequence

Dynamic Adaptive Prompt Engineering

Knowledge Fusion

Evaluation

RAG

Cost saving

Alignment

Datasets

Benchmarks

Domain adaptation

Adversarial Attacks

Review paper

Not paper, but good discussion

Good 101 materials

About

Releases

Packages

annahung31/LLM-paper-list

Folders and files

Latest commit

History

Repository files navigation

LLM-paper-list

Models

Transformer design

Multimodal LLM

Effictive training

Hyperparameter tuning

Parameter-Efficient fine-tuning

Mixture of Experts

After tuning

Efficient inference

Making decoding process faster:

In-context learning

Long-sequence

Dynamic Adaptive Prompt Engineering

Knowledge Fusion

Evaluation

RAG

Cost saving

Alignment

Datasets

Benchmarks

Domain adaptation

Adversarial Attacks

Review paper

Not paper, but good discussion

Good 101 materials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages