Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Stop at eos_token for sampling in online algorithms ✨ enhancement New feature or request 🏋 GRPO Related to GRPO 🏋 PPO Related to PPO
#2892 opened Feb 18, 2025 by haoxiongliu
[Qwen2.5] LoRA with SFT seems to be stuck forever with DeepSpeed 🚀 deepspeed Related to deepspeed ⚡ PEFT Related to PEFT 🏋 SFT Related to SFT
#2891 opened Feb 18, 2025 by sayakpaul
Resue the logits in the _prepare_inputs. ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2888 opened Feb 18, 2025 by linkedlist771
Bottleneck in GRPO training ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2887 opened Feb 18, 2025 by ZYM66
Cause transformers error when use official GRPO examples to exact trainer 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2886 opened Feb 18, 2025 by vagitablebirdcode
5 tasks done
Rewad oscillates while reproduing R1-zero with GRPO 🏋 GRPO Related to GRPO 🏋 Reward Related to Reward modelling
#2884 opened Feb 18, 2025 by Dong237
5 tasks done
Does the vllm 0.6 is ok? 🐛 bug Something isn't working
#2883 opened Feb 18, 2025 by catsled
ORPO Shape Mismatches when using Accelerate/Deepspeed ⚡accelerate Related to accelerate 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 ORPO Related to ORPO
#2882 opened Feb 17, 2025 by dannnnthemannnn
5 tasks done
tensor shape error occurs when training with GRPO and use_vllm = False 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2878 opened Feb 17, 2025 by Saturnoul
5 tasks done
AssertionError grpo 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2877 opened Feb 17, 2025 by GuodongFan
5 tasks done
I have this strange error with GRPO Trainer 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2876 opened Feb 16, 2025 by MohamedAliRashad
GRPO finetune with LORA leads to poor performance 🐛 bug Something isn't working 🏋 GRPO Related to GRPO ⚡ PEFT Related to PEFT
#2872 opened Feb 15, 2025 by zaddy6
5 tasks done
model_init/ref_model_init for reproducable runs 🏋 DPO Related to DPO ✨ enhancement New feature or request 🏋 KTO Related to KTO ⚡ PEFT Related to PEFT
#2870 opened Feb 15, 2025 by claralp
How to train GPRO on 2 GPUs, one for training, one for vllm ⚡accelerate Related to accelerate 🏋 GRPO Related to GRPO ⏳ needs more info Additional information or clarification is required to proceed ⚡ PEFT Related to PEFT
#2864 opened Feb 14, 2025 by AIR-hl
5 tasks done
TRL SFT data knowledge cutoff ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2844 opened Feb 12, 2025 by shirinyamani
GRPO Trainer has problem with num_processes num ⚡accelerate Related to accelerate 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2842 opened Feb 12, 2025 by MAOJIASONG
5 tasks done
A bug in grpo_trainer.py 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2839 opened Feb 12, 2025 by Carloszone
4 of 5 tasks
Bugs of Online DPO example ⚡accelerate Related to accelerate 🐛 bug Something isn't working 🏋 Online DPO Related to Online DPO
#2835 opened Feb 12, 2025 by Snowdar
PPOTrainer OOMing when training Phi-4 4bit LoRA ⚡ PEFT Related to PEFT 🏋 PPO Related to PPO
#2833 opened Feb 11, 2025 by sr5434
Are there any tips and tricks about GRPO reward function design ? 🏋 GRPO Related to GRPO 🏋 Reward Related to Reward modelling
#2832 opened Feb 11, 2025 by MohamedAliRashad
Simple Agentic framework with batch generation ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2830 opened Feb 11, 2025 by August-murr
[GRPO Trainer] Uneven GPU Utilization When Enabling vLLM with Multi-GPU Training ⚡accelerate Related to accelerate 🚀 deepspeed Related to deepspeed 🏋 GRPO Related to GRPO
#2825 opened Feb 11, 2025 by aeroplanepaper
5 tasks done
Tool usage support in tokenizers for Agentic RL ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2821 opened Feb 10, 2025 by August-murr
ProTip! Updated in the last three days: updated:>2025-02-15.