generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issues: huggingface/trl
[Tracking issue] Integrate native liger-kernel losses
#2495
opened Dec 17, 2024 by
qgallouedec
Open
5
[Tracking issue] Wrong loss scaling when accumulating gradient
#2617
opened Jan 23, 2025 by
qgallouedec
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Stop at eos_token for sampling in online algorithms
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
🏋 PPO
Related to PPO
#2892
opened Feb 18, 2025 by
haoxiongliu
[Qwen2.5] LoRA with SFT seems to be stuck forever with DeepSpeed
🚀 deepspeed
Related to deepspeed
⚡ PEFT
Related to PEFT
🏋 SFT
Related to SFT
#2891
opened Feb 18, 2025 by
sayakpaul
Resue the logits in the _prepare_inputs.
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2888
opened Feb 18, 2025 by
linkedlist771
Bottleneck in GRPO training
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2887
opened Feb 18, 2025 by
ZYM66
Cause transformers error when use official GRPO examples to exact trainer
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2886
opened Feb 18, 2025 by
vagitablebirdcode
5 tasks done
PPOTrainer save_model function get error when save; no attribute 'zero_gather_16bit_weights_on_model_save'
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 PPO
Related to PPO
#2885
opened Feb 18, 2025 by
Havefun404
Rewad oscillates while reproduing R1-zero with GRPO
🏋 GRPO
Related to GRPO
🏋 Reward
Related to Reward modelling
#2884
opened Feb 18, 2025 by
Dong237
5 tasks done
ORPO Shape Mismatches when using Accelerate/Deepspeed
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 ORPO
Related to ORPO
#2882
opened Feb 17, 2025 by
dannnnthemannnn
5 tasks done
tensor shape error occurs when training with GRPO and use_vllm = False
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2878
opened Feb 17, 2025 by
Saturnoul
5 tasks done
AssertionError grpo
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2877
opened Feb 17, 2025 by
GuodongFan
5 tasks done
I have this strange error with Something isn't working
🏋 GRPO
Related to GRPO
GRPO Trainer
🐛 bug
#2876
opened Feb 16, 2025 by
MohamedAliRashad
model_init/ref_model_init for reproducable runs
🏋 DPO
Related to DPO
✨ enhancement
New feature or request
🏋 KTO
Related to KTO
⚡ PEFT
Related to PEFT
#2870
opened Feb 15, 2025 by
claralp
How to train GPRO on 2 GPUs, one for training, one for vllm
⚡accelerate
Related to accelerate
🏋 GRPO
Related to GRPO
⏳ needs more info
Additional information or clarification is required to proceed
⚡ PEFT
Related to PEFT
#2864
opened Feb 14, 2025 by
AIR-hl
5 tasks done
TRL SFT data knowledge cutoff
❓ question
Seeking clarification or more information
🏋 SFT
Related to SFT
#2844
opened Feb 12, 2025 by
shirinyamani
GRPO Trainer has problem with num_processes num
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2842
opened Feb 12, 2025 by
MAOJIASONG
5 tasks done
GRPOTrainer fails to transfer weights to vLLM with Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 GRPO
Related to GRPO
_move_model_to_vllm
after 7.5 hours of the job running
🐛 bug
#2840
opened Feb 12, 2025 by
casper-hansen
5 tasks done
A bug in grpo_trainer.py
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2839
opened Feb 12, 2025 by
Carloszone
4 of 5 tasks
Bugs of Online DPO example
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
🏋 Online DPO
Related to Online DPO
#2835
opened Feb 12, 2025 by
Snowdar
PPOTrainer OOMing when training Phi-4 4bit LoRA
⚡ PEFT
Related to PEFT
🏋 PPO
Related to PPO
#2833
opened Feb 11, 2025 by
sr5434
Are there any tips and tricks about GRPO reward function design ?
🏋 GRPO
Related to GRPO
🏋 Reward
Related to Reward modelling
#2832
opened Feb 11, 2025 by
MohamedAliRashad
Simple Agentic framework with batch generation
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2830
opened Feb 11, 2025 by
August-murr
[GRPO Trainer] Uneven GPU Utilization When Enabling vLLM with Multi-GPU Training
⚡accelerate
Related to accelerate
🚀 deepspeed
Related to deepspeed
🏋 GRPO
Related to GRPO
#2825
opened Feb 11, 2025 by
aeroplanepaper
5 tasks done
Tool usage support in tokenizers for Agentic RL
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2821
opened Feb 10, 2025 by
August-murr
Previous Next
ProTip!
Updated in the last three days: updated:>2025-02-15.