Skip to content

annahung31/LLM-paper-list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLM-paper-list

There are many papers and blogs related to LLM emerge every week. So I created this list to collect papers that I'm interested in. This list was created on Oct. 23, 2023, so some important papers before this date might be missed.

Models

  • Llemma: An Open Language Model For Mathematics paper, code

    • #mathematics #codellama
  • Magicoder: Source Code Is All You Need (a fully open sourced coding model) paper, code

Transformer design

  • Simplifying Transformer Blocks paper
  • Alternating Updates for Efficient Transformers (Google Research) (NeutIPS'23) paper

Multimodal LLM

  • CogVLM: Visual Expert for Pretrained Language Models paper, code
  • Emu Edit: Precise Image Editing via Recognition and Generation Tasks (from Meta) (use text instructions to modify images) paper, blog
  • LLaVA: Large Language and Vision Assistant (from Microsoft) (NeurIPS'23) Main Page, code
  • Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities (from Google DeepMind) (text, video, audio) paper

Effictive training

  • FP8-LM: Training FP8 Large Language Models paper, code
  • Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer paper, code
  • MoLORA (Mixture of LORA) (from cohere) paper, code
  • Gated Linear Attention Transformers with Hardware-Efficient Training paper
    • Performance still worse than transformer-based model

Hyperparameter tuning

  • Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments (from lightning AI) blog

Parameter-Efficient fine-tuning

  • LoRA: Low-Rank Adaptation of Large Language Models paper
  • LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment paper
    • Dealing with world knowledge forgetting during SFT

Mixture of Experts

  • A curated reading list of research in Adaptive Computation (AC) & Mixture of Experts (MoE) repo
  • MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (“Here's the paper you need to read understand today” - Sasha Rush) paper
  • Mistral MoE base model blog
  • Calculate an MoE model by hand post

After tuning

  • Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch paper, code
    • Merge SFT models into base LLM with special method can improve the performance.

Efficient inference

  • s-lora (batch lora weight inferencing) code
  • blogs:
  • PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU paper, code
    • significantly reduces GPU memory demands and CPU-GPU data transfer
  • LLM in a flash: Efficient Large Language Model Inference with Limited Memory (From Apple) paper

Making decoding process faster:

  • Lookahead Decoding blog
  • PaSS: Parallel Speculative Sampling (from Apple)(NeurIPS'23) paper

In-context learning

  • Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (from DeepMind) paper

Long-sequence

Dynamic Adaptive Prompt Engineering

  • Chain of Code:Reasoning with a Language Model-Augmented Code Emulator (from DeepMind and Fei Fei Li) project page, paper
  • Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding (from Google Research) paper

Knowledge Fusion

  • Knowledge Fusion of Large Language Models (ICLR'24) paper

Evaluation

  • PromptBench - a unified library that supports comprehensive evaluation and analysis of LLMs code

RAG

  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection paper, main page
    • learns to retrieve, generate and critique to enhance LM's output quality and factuality, outperforming ChatGPT and retrieval-augmented LLama2 Chat on six tasks.
  • Understanding Retrieval Augmentation for Long-Form Question Answering paper
    • evidence documents should be carefully added to the LLM
    • the order of information presented in evidence documents will impact the order of information presented in the generated answer
  • Learning to Filter Context for Retrieval-Augmented Generation paper, code
  • Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models paper
  • Retrieval-Augmented Generation for Large Language Models: A Survey paper

Cost saving

  • AutoMix: Automatically Mixing Language Models paper
    • route queries to LLMs based on the correctness of smaller language models

Alignment

  • SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF paper, model
    • aligning LLMs without using RLHF
  • Fine-tuning Language Models for Factuality paper

Datasets

  • Proof-Pile-2 dataset: the dataset involves scientific paper, web data containing mathematics, and mathematical code. link
    • #llemma
    • AlgebraicStack includes many coding language data. There are 954.1 M tokens for C++ code.
  • RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset code
  • Open Platypus paper, data, data size: 24,926
  • Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math paper, code, HF dataset page
    • tokenizer: GPTNeoX-20B

Benchmarks

  • GPQA: A Graduate-Level Google-Proof Q&A Benchmark (very hard questions) (from Cohere, Anthropic, NYU) paper, data and code, data size: 448
  • GAIA: a benchmark for General AI Assistants (from Meta, Yann LeCun) paper, page

Domain adaptation

  • LLMs for Chip Design paper (from NVIDIA)

Adversarial Attacks

  • Adversarial Attacks on GPT-4 via Simple Random Search paper

Review paper

  • Large Language Models for Software Engineering: Survey and Open Problems paper
    • LLM for code generation
    • LLM for software testing, debugging, repair
    • LLM for documentation generation
  • Software testing with large language model: Survey, landscape, and vision paper
  • A Survey on Language Models for Code paper, code
  • Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks paper (NeurIPS'23)
  • The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 paper
  • Data Management For Large Language Models: A Survey (for pretraining and SFT) paper, code
  • If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents paper
    • A comprehensive overview of the benefits of training LLMs with code-specific data.

Not paper, but good discussion

Good 101 materials

About

LLM paper collection for my need

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published