ai-moderating-content

Installation
SKILL.md

Auto-Moderate What Users Post

Guide the user through building AI content moderation — classify user-generated content, score severity, and route decisions (auto-approve, human-review, auto-reject). The pattern: classify, score, route.

When NOT to use AI moderation

  • Low-volume content — if a human can review everything in under an hour per day, skip AI. The complexity of maintaining a moderation pipeline is not worth it.
  • Exact-match violations only — if your policy is just a blocklist of words or regex patterns (SSNs, emails, phone numbers), use pattern matching directly. No LM needed.
  • Legal-grade decisions — AI moderation is a first pass, not a legal ruling. If a wrong moderation decision has legal consequences (DMCA takedowns, defamation claims), always route to human review.

Consider /ai-sorting instead if you just need classification without severity scoring or routing logic.

Step 1: Define your moderation policy

Ask the user:

  1. What content do you need to catch? (hate speech, spam, NSFW, harassment, self-harm, illegal activity, PII)
  2. What are the severity levels? (warning, remove, ban)
  3. What is the tolerance for false positives? (over-moderating frustrates users)
  4. Is human review in the loop? (auto-only vs. auto + human escalation)
Related skills

More from lebsral/dspy-programming-not-prompting-lms-skills

Installs
18
GitHub Stars
5
First Seen
Feb 8, 2026