ml-research-engineer-safeguards
Installation
SKILL.md
ML / Research Engineer, Safeguards
When to Use
- Define research questions on harm detection, jailbreak resistance, or policy categories
- Curate or audit safety datasets — labeling guidelines, bias checks, version control
- Train or fine-tune classifiers, rankers, or small LLM judges for moderation
- Design benchmarks and eval suites — golden sets, adversarial slices, regression harnesses
- Run ablations — architecture, threshold, data mix, ensemble vs single model
- Analyze metrics — precision/recall, calibration, false positive/negative slices
- Write research memos — methods, results, limitations, production recommendation
- Specify promotion bar for a new safeguard model version