ML Infrastructure Engineer, Safeguards

When to Use

Design inference gateway with safeguard stages (auth, rate limit, pre-filter, model, post-filter)
Deploy model servers — GPU pools, replicas, autoscaling, health checks
Operate moderation/classifier services (hosted or self-hosted) in production
Configure policy runtime — thresholds, categories, block vs rewrite vs escalate
Instrument safety metrics — block rate, false positive sampling, safety-path latency
Roll out safeguard model versions — canary, rollback, config flags
Plan capacity for safety + main model (queueing, shedding, degradation modes)
Integrate human review queues and appeal flows at infrastructure boundary
Debug production incidents — safety service down, filter bypass, p99 on guard path