model-training

Installation
SKILL.md

Model Training

This skill enables an AI agent to train machine learning models on structured or unstructured datasets. It covers the full training lifecycle: loading and preprocessing data, defining model architectures, configuring optimizers and loss functions, running training loops with validation, applying learning rate scheduling, and saving checkpoints. The agent can handle both classical ML and deep learning workflows across frameworks like PyTorch, TensorFlow, and scikit-learn.

Workflow

  1. Load and inspect data: Read the dataset from disk, database, or remote storage. Profile the data to understand feature distributions, class balance, missing values, and data types. Split into training, validation, and test sets using stratified sampling when class imbalance is present.

  2. Preprocess and transform: Apply feature engineering such as normalization, standardization, tokenization (for text), or augmentation (for images). Build preprocessing pipelines that are reproducible and serializable so the same transforms apply at inference time.

  3. Define model architecture: Select or construct the model architecture appropriate for the task. For classical ML, choose estimators like gradient boosting or SVMs. For deep learning, define layers, activation functions, and regularization such as dropout or weight decay. When transfer learning is applicable, load a pre-trained backbone and attach task-specific heads.

  4. Configure training: Set the optimizer (Adam, SGD, AdamW), loss function (cross-entropy, MSE, focal loss), learning rate schedule (cosine annealing, step decay, warmup), and batch size. Enable mixed precision training with torch.amp or tf.keras.mixed_precision when training on GPUs to reduce memory usage and speed up computation.

  5. Execute training loop with validation: Train for the specified number of epochs, logging training loss and metrics per batch or epoch. Evaluate on the validation set at regular intervals. Implement early stopping to halt training when validation performance plateaus for a configurable number of epochs (patience).

  6. Checkpoint and export: Save model checkpoints at the best validation score and at regular intervals. Export the final model in a portable format (ONNX, TorchScript, SavedModel) for downstream deployment. Log all hyperparameters and metrics to an experiment tracker like MLflow or Weights & Biases.

Supported Technologies

Related skills
Installs
9
GitHub Stars
78
First Seen
Mar 19, 2026