openclaw-rl-training
Installation
SKILL.md
OpenClaw-RL Training Skill
Skill by ara.so — Hermes Skills collection.
Overview
OpenClaw-RL is a fully asynchronous reinforcement learning framework that trains personalized AI agents from natural conversation feedback. It wraps self-hosted models in an OpenClaw-compatible API, intercepts live multi-turn conversations, and continuously optimizes the policy in the background without interrupting usage.
Key capabilities:
- Fully async 4-component architecture (serving, rollout, evaluation, training)
- Three learning paradigms: Binary RL (GRPO), On-Policy Distillation (OPD), Hybrid Combine
- Self-hosted and private — runs entirely on your infrastructure
- Supports personal agent optimization and general agentic RL (terminal, GUI, SWE, tool-call)
- Zero manual labeling — automatic trajectory creation from conversations