guardrails
Guardrails & Safety
Guardrails are the firewall of an AI system. They sit between the user and the agent (Input Guardrail) and between the agent and the user (Output Guardrail). They enforce policy, security, and tone. Unlike the main agent, which tries to be helpful, the guardrail tries to be safe and compliant.
When to Use
- Jailbreak Prevention: Stopping users from tricking the model ("Ignore previous instructions...").
- PII Protection: Detecting and redacting phone numbers, emails, or credit cards.
- Topic Adherence: Ensuring a customer support bot doesn't discuss politics or religion.
- Brand Safety: preventing the model from generating offensive or competitor-promoting content.
Use Cases
- Input Filter: Blocking prompts that violate usage policies.
- Output Filter: Blocking model responses that contain hate speech or hallucinations.
- Sandboxing: Ensuring code generated by the agent acts within safe bounds (e.g., no network access).
Implementation Pattern
More from lauraflorentin/skills-marketplace
multi-agent-collaboration
A structural pattern where multiple specialized agents communicate and coordinate to solve a problem that is too complex for a single agent. Use when user asks to "build a multi-agent system", "agents working together", "agent collaboration", or mentions team of agents, distributed agents, or swarm.
23reflection
A recursive pattern where an agent evaluates and critiques its own output to iteratively improve quality and catch errors. Use when user asks to "add self-reflection", "agent introspection", "self-critique", or mentions self-evaluation, meta-cognition, or quality self-assessment.
18human-in-the-loop
A hybrid pattern where the system pauses execution to request human approval, input, or disambiguation before proceeding with critical actions. Use when user asks to "add human approval", "require human review", "human-in-the-loop", or mentions approval workflows, human oversight, or escalation.
17planning
A high-level cognitive pattern where an agent formulates a structured sequence of actions (a plan) before executing any of them, ensuring goal-directed behavior. Use when user asks to "add planning to my agent", "task planning", "agent planning", or mentions plan generation, plan execution, or step-by-step planning.
14parallelization
A concurrency pattern where multiple agent tasks are executed at the same time to speed up processing or gather diverse perspectives. Use when user asks to "run agents in parallel", "parallelize tasks", "concurrent execution", or mentions parallel processing, fan-out, or batch execution.
13routing
A control flow pattern where a central component classifies an input request and directs it to the most appropriate specialized agent or tool. Use when user asks to "route between agents", "agent routing", "task dispatch", or mentions classifier routing, intent detection, or agent selection.
12