reinforcement-learning
Reinforcement Learning Best Practices
Overview
This skill provides comprehensive guidance for implementing reinforcement learning in Python using the modern ecosystem (2024-2025). Gymnasium has replaced OpenAI Gym as the standard environment interface. Stable-Baselines3 (SB3) is recommended for prototyping, RLlib for production/distributed training, and CleanRL for research.
When to Use
- Building RL agents for discrete or continuous control tasks
- Creating custom simulation environments
- Tuning hyperparameters for RL algorithms
- Debugging training issues (reward curves, policy collapse, numerical instability)
- Deploying trained policies to production
Library Selection
| Library | Best For | Ease | Flexibility | Production |
|---|---|---|---|---|
| Stable-Baselines3 | Prototyping, learning | High | Medium | Good |
More from aznatkoiny/zai-skills
consulting-frameworks
>
113real-estate-investment
>
13x402-payments
|
11cpp-reinforcement-learning
|
9deep-learning
Comprehensive guide for Deep Learning with Keras 3 (Multi-Backend: JAX, TensorFlow, PyTorch). Use when building neural networks, CNNs for computer vision, RNNs/Transformers for NLP, time series forecasting, or generative models (VAEs, GANs). Covers model building (Sequential/Functional/Subclassing APIs), custom training loops, data augmentation, transfer learning, and production best practices.
8prompt-optimizer
Optimize prompts for Claude 4.x models using Anthropic's official best practices. Use when users want to improve, refine, or create effective prompts for Claude. Triggers include requests to optimize prompts, make prompts more effective, fix underperforming prompts, create system prompts, improve instruction following, reduce verbosity, control output formatting, or enhance agentic/tool-use behaviors. Also use when users report Claude is being too verbose, not following instructions, not using tools properly, or producing generic outputs.
8