grpo-rl-training
GRPO/RL Training with TRL
Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-ready workflows for fine-tuning language models with custom reward functions.
When to Use This Skill
Use GRPO training when you need to:
- Enforce specific output formats (e.g., XML tags, JSON, structured reasoning)
- Teach verifiable tasks with objective correctness metrics (math, coding, fact-checking)
- Improve reasoning capabilities by rewarding chain-of-thought patterns
- Align models to domain-specific behaviors without labeled preference data
- Optimize for multiple objectives simultaneously (format + correctness + style)
Do NOT use GRPO for:
- Simple supervised fine-tuning tasks (use SFT instead)
- Tasks without clear reward signals
- When you already have high-quality preference pairs (use DPO/PPO instead)
More from ovachiever/droid-tings
security-auditor
Continuous security vulnerability scanning for OWASP Top 10, common vulnerabilities, and insecure patterns. Use when reviewing code, before deployments, or on file changes. Scans for SQL injection, XSS, secrets exposure, auth issues. Triggers on file changes, security mentions, deployment prep.
751react-hook-form-zod
|
459nextjs-shadcn-builder
Build new Next.js applications or migrate existing frontends (React, Vue, Angular, vanilla JS, etc.) to Next.js + shadcn/ui with systematic analysis and conversion. Enforces shadcn design principles - CSS variables for theming, standard UI components, no hardcoded values, consistent typography/colors. Use for creating Next.js apps, migrating frontends, adopting shadcn/ui, or standardizing component libraries. Includes MCP integration for shadcn documentation and automated codebase analysis.
226deep-reading-analyst
Comprehensive framework for deep analysis of articles, papers, and long-form content using 10+ thinking models (SCQA, 5W2H, critical thinking, inversion, mental models, first principles, systems thinking, six thinking hats). Use when users want to: (1) deeply understand complex articles/content, (2) analyze arguments and identify logical flaws, (3) extract actionable insights from reading materials, (4) create study notes or learning summaries, (5) compare multiple sources, (6) transform knowledge into practical applications, or (7) apply specific thinking frameworks. Triggered by phrases like 'analyze this article,' 'help me understand,' 'deep dive into,' 'extract insights from,' 'use [framework name],' or when users provide URLs/long-form content for analysis.
191playwright browser automation
Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check responsive design, validate UX, test login flows, check links, automate any browser task. Use when user wants to test websites, automate browser interactions, validate web functionality, or perform any browser-based testing.
146threejs-graphics-optimizer
Performance optimization rules for THREE.js and graphics programming. Covers mobile-first optimization, fallback patterns, memory management, render loop efficiency, and general graphics best practices for smooth 60fps experiences across devices.
107