guardrails

Installation
SKILL.md

Guardrails & Safety

Guardrails are the firewall of an AI system. They sit between the user and the agent (Input Guardrail) and between the agent and the user (Output Guardrail). They enforce policy, security, and tone. Unlike the main agent, which tries to be helpful, the guardrail tries to be safe and compliant.

When to Use

  • Jailbreak Prevention: Stopping users from tricking the model ("Ignore previous instructions...").
  • PII Protection: Detecting and redacting phone numbers, emails, or credit cards.
  • Topic Adherence: Ensuring a customer support bot doesn't discuss politics or religion.
  • Brand Safety: preventing the model from generating offensive or competitor-promoting content.

Use Cases

  • Input Filter: Blocking prompts that violate usage policies.
  • Output Filter: Blocking model responses that contain hate speech or hallucinations.
  • Sandboxing: Ensuring code generated by the agent acts within safe bounds (e.g., no network access).

Implementation Pattern

Related skills

More from lauraflorentin/skills-marketplace

Installs
2
First Seen
12 days ago