Machine Learning Engineering Guide

Overview

This guide covers end-to-end machine learning engineering with deep learning (PyTorch, HuggingFace Transformers) and classical ML (scikit-learn, XGBoost). Use it when building, training, evaluating, and deploying ML models across NLP, vision, and tabular domains.

First 10 Minutes

Identify the task type first: classification, regression, ranking, generation, retrieval, or multimodal. If the task type is fuzzy, the evaluation plan will be wrong.
Inspect the dataset shape and leakage risk before model choice. Use scripts/analyze_dataset.py immediately, then document label balance, missing values, and leakage candidates.
Define the baseline and acceptance metric before training. If there is no baseline, create one first.
If the request involves RAG, separate retrieval evaluation from answer evaluation from the start.

Refuse or Escalate

Refuse requests to fine-tune when there is no labeled data, no evaluation set, or no baseline to beat.
Escalate if the task is high-stakes and the user cannot provide evaluation criteria, data provenance, or rollback behavior for a bad model.
Do not recommend a larger model by default when the failure is clearly dataset quality, leakage, or retrieval mismatch.
Escalate before production rollout if the team cannot monitor latency, output drift, and failure rate after deployment.

engineering-ml-engineer

Machine Learning Engineering Guide

Overview

First 10 Minutes

Refuse or Escalate