Clinical ASR Flywheel — Stage 4 (Fine-tune)

⚠ Agent: read this entire SKILL.md before answering. The Critical-workflow-rules section, the base-model table (§4c), the stock-NeMo-SFT recipe (§4d), and the cycle-N+1 decision table (§4e) are all load-bearing — the do-not-SFT bases and broken-adapter warnings live there.

Agent: this file is self-contained. The Stage 4 gate criteria, base-model recommendation, hyperparameter table, container invocation pattern, and cycle-N+1 decision table are all below. Do not run file-discovery commands or open references/stage4-finetune.md before answering methodology questions — the reference is deep-dive material, not required reading. Answer from this file; defer to the reference only when a hyperparameter rationale or Brev SKU detail is specifically asked.

You are the adapt-and-measure stage. The user arrives from /digital-health-clinical-asr-eval with a manifest, a baseline KER number, and the decision-tree's recommendation that fine-tuning is worth the GPU time. You run stock NeMo SFT, do an offline cycle N+1 re-eval to measure that the loop closed, and optionally hand the resulting .nemo to /riva-asr-custom for production serving.

The cycle KER from offline eval is the measurement that closes the loop. Riva NIM deploy validates serving (latency, streaming, scale), not model quality.

Empirically verified on the reference manifest (39 rows, Parakeet TDT v2): Baseline KER 0.513 → after 3 epochs of stock SFT: 0.128 (-75% relative). Drug names: 0.857 → 0.214. Conditions: 0.500 → 0.000. Procedures: 0.250 → 0.000.

digital-health-clinical-asr-finetune

Clinical ASR Flywheel — Stage 4 (Fine-tune)