Welcome to the GitHub repository of the Peking University ML System Lab - Baichuan Inc. Joint Laboratory.
We are dedicated to advancing research in Data-Centric Machine Learning (DCML), Large Language Models (LLMs), and Machine Learning Systems (ML Systems).
Our goal is to develop effective and efficient data preparation systems and algorithms that support and enhance the performance of machine learning models.
🔥 2024/10/21 BUTTON: Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning 🌴 Repo 🌲 arXiv
🔥 2024/10/15 FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback 🌴 Repo 🌲 arXiv
🔥 2024/09/29 BEATS: OPTIMIZING LLM MATHEMATICAL CAPA-BILITIES WITH BACKVERIFY AND ADAPTIVE DISAM-BIGUATE BASED EFFICIENT TREE SEARCH 🌲 arXiv
🔥 2024/09/26 Data Proportion Detection for Optimized Data Management for Large Language Models [Vision] 🌲 arXiv
🔥 2024/09/02 DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning 🌲 arXiv
🔥 2024/08/27 BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline 🌴 Repo 🌲 arXiv
🔥 2024/08/21 MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark 🌴 Repo 🌲 arXiv
🔥 2024/08/20 SysBench: Can Large Language Models Follow System Messages? 🌴 Repo 🌲 arXiv
🔥 2024/08/14 Llama3-PBM-Nova-70B Model is released! 🤗 Huggingface
🔥 2024/08/07 PAS: Data-Efficient Plug-and-Play Prompt Augmentation System 🤗 Huggingface 🌴 Repo 🌲 arXiv
🔥 2024/08/02 CFBench: A Comprehensive Constraints-Following Benchmark for LLMs 🌴 Repo 🌲 arXiv