There are many papers and blogs related to LLM emerge every week. So I created this list to collect papers that I'm interested in. This list was created on Oct. 23, 2023, so some important papers before this date might be missed.
-
Llemma: An Open Language Model For Mathematics paper, code
- #mathematics #codellama
-
Magicoder: Source Code Is All You Need (a fully open sourced coding model) paper, code
- Simplifying Transformer Blocks paper
- Alternating Updates for Efficient Transformers (Google Research) (NeutIPS'23) paper
- CogVLM: Visual Expert for Pretrained Language Models paper, code
- Emu Edit: Precise Image Editing via Recognition and Generation Tasks (from Meta) (use text instructions to modify images) paper, blog
- LLaVA: Large Language and Vision Assistant (from Microsoft) (NeurIPS'23) Main Page, code
- Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities (from Google DeepMind) (text, video, audio) paper
- FP8-LM: Training FP8 Large Language Models paper, code
- Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer paper, code
- MoLORA (Mixture of LORA) (from cohere) paper, code
- Gated Linear Attention Transformers with Hardware-Efficient Training paper
- Performance still worse than transformer-based model
- Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments (from lightning AI) blog
- LoRA: Low-Rank Adaptation of Large Language Models paper
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment paper
- Dealing with world knowledge forgetting during SFT
- A curated reading list of research in Adaptive Computation (AC) & Mixture of Experts (MoE) repo
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (“Here's the paper you need to read understand today” - Sasha Rush) paper
- Mistral MoE base model blog
- Calculate an MoE model by hand post
- Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch paper, code
- Merge SFT models into base LLM with special method can improve the performance.
- s-lora (batch lora weight inferencing) code
- blogs:
- LLM系列笔记:LLM Inference量化分析与加速
- How to make LLMs go fast blog
- PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU paper, code
- significantly reduces GPU memory demands and CPU-GPU data transfer
- LLM in a flash: Efficient Large Language Model Inference with Limited Memory (From Apple) paper
- Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (from DeepMind) paper
- Efficient Streaming Language Models with Attention Sinks paper, open source implementation
- YaRN: Efficient Context Window Extension of Large Language Models paper, YaRN on Mistral-7b-128k
- RoPE scaling post, Hugging Face implementation
- Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey paper, repo
- The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey paper
- Chain of Code:Reasoning with a Language Model-Augmented Code Emulator (from DeepMind and Fei Fei Li) project page, paper
- Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding (from Google Research) paper
- Knowledge Fusion of Large Language Models (ICLR'24) paper
- PromptBench - a unified library that supports comprehensive evaluation and analysis of LLMs code
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection paper, main page
- learns to retrieve, generate and critique to enhance LM's output quality and factuality, outperforming ChatGPT and retrieval-augmented LLama2 Chat on six tasks.
- Understanding Retrieval Augmentation for Long-Form Question Answering paper
- evidence documents should be carefully added to the LLM
- the order of information presented in evidence documents will impact the order of information presented in the generated answer
- Learning to Filter Context for Retrieval-Augmented Generation paper, code
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models paper
- Retrieval-Augmented Generation for Large Language Models: A Survey paper
- AutoMix: Automatically Mixing Language Models paper
- route queries to LLMs based on the correctness of smaller language models
- SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF paper, model
- aligning LLMs without using RLHF
- Fine-tuning Language Models for Factuality paper
- Proof-Pile-2 dataset: the dataset involves scientific paper, web data containing mathematics, and mathematical code. link
- #llemma
- AlgebraicStack includes many coding language data. There are 954.1 M tokens for C++ code.
- RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset code
- Open Platypus paper, data, data size: 24,926
- Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math paper, code, HF dataset page
- tokenizer: GPTNeoX-20B
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark (very hard questions) (from Cohere, Anthropic, NYU) paper, data and code, data size: 448
- GAIA: a benchmark for General AI Assistants (from Meta, Yann LeCun) paper, page
- LLMs for Chip Design paper (from NVIDIA)
- Adversarial Attacks on GPT-4 via Simple Random Search paper
- Large Language Models for Software Engineering: Survey and Open Problems paper
- LLM for code generation
- LLM for software testing, debugging, repair
- LLM for documentation generation
- Software testing with large language model: Survey, landscape, and vision paper
- A Survey on Language Models for Code paper, code
- Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks paper (NeurIPS'23)
- The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 paper
- Data Management For Large Language Models: A Survey (for pretraining and SFT) paper, code
- If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents paper
- A comprehensive overview of the benefits of training LLMs with code-specific data.
- DPO v.s RLHF (from Latent Space podcast) RLHF 201 - with Nathan Lambert of AI2 and Interconnects
- LLM course (got 10k stars) repo