Code LLM

Code LLM
- Survey
- Code LLM
- Datasets
- Evaluation
- Projects
- Products
- Misc

Survey

Code LLM

ProgCo: Program Helps Self-Correction of Large Language Models, arXiv, 2501.01264, arxiv, pdf, cication: -1

Xiaoshuai Song, Yanan Wu, Weixun Wang, ..., Wenbo Su, Bo Zheng
Dynamic Scaling of Unit Tests for Code Reward Modeling, arXiv, 2501.01054, arxiv, pdf, cication: -1

Zeyao Ma, Xiaokang Zhang, Jing Zhang, ..., Sijia Luo, Jie Tang
Training Software Engineering Agents and Verifiers with SWE-Gym, arXiv, 2412.21139, arxiv, pdf, cication: -1

Jiayi Pan, Xingyao Wang, Graham Neubig, ..., Alane Suhr, Yizhe Zhang · (SWE-Gym - SWE-Gym)
Outcome-Refining Process Supervision for Code Generation, arXiv, 2412.15118, arxiv, pdf, cication: -1

Zhuohao Yu, Weizheng Gu, Yidong Wang, ..., Wei Ye, Shikun Zhang · (ORPS - zhuohaoyu)
🌟 o1-Coder: an o1 Replication for Coding, arXiv, 2412.00154, arxiv, pdf, cication: -1

Yuxiang Zhang, Shangxi Wu, Yuqi Yang, ..., Chao Kong, Jitao Sang · (O1-CODER - ADaM-BJTU)
Leveraging training and search for better software engineering agents

· (𝕏)
Bug fixes & analysis for Qwen 2.5 𝕏

· (t)
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models, arXiv, 2411.05830, arxiv, pdf, cication: -1

Nizar Islah, Justine Gehring, Diganta Misra, ..., Terry Yue Zhuo, Massimo Caccia · (GitChameleon - NizarIslah)
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). 🤗
🌟 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level, arXiv, 2411.03562, arxiv, pdf, cication: -1

Antoine Grosnit, Alexandre Maraval, James Doran, ..., Haitham Bou-Ammar, Jun Wang
🌟 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models, arXiv, 2411.04905, arxiv, pdf, cication: -1

Siming Huang, Tianhao Cheng, Jason Klein Liu, ..., Yinghui Xu, Wei Chu · (opencoder-llm.github)
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions, arXiv, 2410.20424, arxiv, pdf, cication: -1

Ziming Li, Qianbo Zang, David Ma, ..., Wenhao Huang, Ge Zhang · (AutoKaggle%5D - multimodal-art-projection)
SelfCodeAlign: Self-Alignment for Code Generation, arXiv, 2410.24198, arxiv, pdf, cication: -1

Yuxiang Wei, Federico Cassano, Jiawei Liu, ..., Arjun Guha, Lingming Zhang · (selfcodealign - bigcode-project)
Learning Code Preference via Synthetic Evolution

Datasets

CUDABench

Evaluation

The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models, arXiv, 2501.09653, arxiv, pdf, cication: -1

Jonathan Katzy, Razvan Mihai Popescu, Arie van Deursen, ..., Maliheh Izadi
🌟 CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings, arXiv, 2501.01257, arxiv, pdf, cication: -1

Shanghaoran Quan, Jiaxi Yang, Bowen Yu, ..., Binyuan Hui, Junyang Lin · (codeelo-bench.github)
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation, arXiv, 2412.21199, arxiv, pdf, cication: -1

Zhaojian Yu, Yilun Zhao, Arman Cohan, ..., Xiao-Ping Zhang · (CodeEval-Pro - CodeEval-Pro) · (answers111.github)
Evaluating and Aligning CodeLLMs on Human Preference, arXiv, 2412.05210, arxiv, pdf, cication: -1

Jian Yang, Jiaxi Yang, Ke Jin, ..., Binyuan Hui, Junyang Lin · (Qwen2.5-Coder - QwenLM) · (arxiv) · (huggingface)
KernelBench, a new code generation benchmark for evaluating models' ability to generate correct and efficient CUDA kernels. 𝕏
Copilot Arena
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet', arXiv, 2410.21647, arxiv, pdf, cication: -1

Shanchao Liang, Yiran Hu, Nan Jiang, ..., Lin Tan
cursor + claude is cool but not coming for our jobs either imo 𝕏
Aider LLM Leaderboards
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI, arXiv, 2410.11096, arxiv, pdf, cication: -1

Yu Yang, Yuzhou Nie, Zhun Wang, ..., Bo Li, Dawn Song · (seccodeplt.github) · (huggingface)

Projects

deepseek-engineer - Doriandarko
SWE-Gym - SWE-Gym
gitingest - cyclotruc
llamacoder - Nutlope
MPLSandbox - Ablustrund

· (arxiv)
Lingma-SWE-GPT - LingmaTongyi

SoftWare Engineering Process Data Synthesis and Inference Workflow for Lingma SWE-GPT
Awesome-Code-LLM - huybery
aider - Aider-AI
screenshot-to-code - abi
composio - ComposioHQ
fast-apply - kortix-ai

Pipeline for Data Generation & Fine-Tuning Qwen2.5 Coder Models
sage - Storia-AI

Chat with any codebase

Products

PearAI: The Open Source AI Code Editor
NEO A fully autonomousMachine Learning Engineer
The first agentic IDE, and then some. The Windsurf Editor is where the work of developers and AI truly flow together, allowing for a coding experience that feels like literal magic
Edit your codebase and run commands quicklywith natural language in your terminal.
Find out how we’re evolving GitHub and GitHub Copilot—and get access to the latest previews and GA releases.

Misc

Best of 2024 in Agents (from #1 on SWE-Bench Full, Prof. Graham Neubig of OpenHands/AllHands) 🎬
geminiCoder - osanseviero
Windsurf Cascade Leaked System prompt

· (reddit)
repomix - yamadashy
Qwen / Qwen2.5-Coder-Artifacts 🤗
Using Large Language Models To Catch Vulnerabilities In Real-World Code

· (jiqizhixin)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code_llm.md

code_llm.md

Code LLM

Survey

Code LLM

Datasets

Evaluation

Projects

Products

Misc

Files

code_llm.md

Latest commit

History

code_llm.md

File metadata and controls

Code LLM

Survey

Code LLM

Datasets

Evaluation

Projects

Products

Misc