ai-agents
AI Agents Development — Production Skill Hub
Modern Best Practices (March 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.
No theory. No narrative. Only operational steps and templates.
When to Use This Skill
Codex should activate this skill whenever the user asks for:
- Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
- Scoping capability maturity and rollout risk for new agent behaviors.
- Creating action loops, plans, workflows, or delegation logic.
- Writing tool definitions, MCP tools, schemas, or validation logic.
More from vasilyu1983/ai-agents-public
product-management
Founder-PM toolkit for discovery, roadmaps, prioritization, and PMF measurement. Use when planning product strategy, metrics, or roadmaps.
684software-architecture-design
Designs system structure across monolith/microservices/serverless. Use when structuring systems, scaling, decomposing monoliths, or choosing patterns.
519software-ui-ux-design
Designs and audits UI/UX with WCAG 2.2 accessibility. Use when designing flows, running heuristic reviews, or defining design systems.
383qa-testing-playwright
E2E web testing with Playwright. Use when writing tests, debugging flakes, or setting up CI with selectors, sharding, and network mocking.
372document-pdf
Extract text/tables from PDFs, create formatted PDFs, merge/split/rotate, and handle forms. Use for any PDF generation or parsing task.
325qa-testing-strategy
Risk-based test strategy for software delivery. Use when defining coverage, setting CI gates, managing flaky tests, or establishing release criteria.
317