unsloth-grpo
@cuba6112/skillfactory0
0
Implementation of Group Relative Policy Optimization (GRPO) for training reasoning models, optimized for 8x memory savings (triggers: GRPO, reasoning, DeepSeek-R1, reinforcement learning, RLVR, GRPOTrainer, thinking tokens).
Install Skill
1Download skill
2Enable skills in Claude
Open claude.ai/settings/capabilities and find the "Skills" section
3Upload to Claude
Click "Upload skill" and select the downloaded ZIP file
Note: Please verify skill by going through its instructions before using it.