engineering-ml-engineer
Machine Learning Engineering Guide
Overview
This guide covers end-to-end machine learning engineering with deep learning (PyTorch, HuggingFace Transformers) and classical ML (scikit-learn, XGBoost). Use it when building, training, evaluating, and deploying ML models across NLP, vision, and tabular domains.
First 10 Minutes
- Identify the task type first: classification, regression, ranking, generation, retrieval, or multimodal. If the task type is fuzzy, the evaluation plan will be wrong.
- Inspect the dataset shape and leakage risk before model choice. Use
scripts/analyze_dataset.pyimmediately, then document label balance, missing values, and leakage candidates. - Define the baseline and acceptance metric before training. If there is no baseline, create one first.
- If the request involves RAG, separate retrieval evaluation from answer evaluation from the start.
Refuse or Escalate
- Refuse requests to fine-tune when there is no labeled data, no evaluation set, or no baseline to beat.
- Escalate if the task is high-stakes and the user cannot provide evaluation criteria, data provenance, or rollback behavior for a bad model.
- Do not recommend a larger model by default when the failure is clearly dataset quality, leakage, or retrieval mismatch.
- Escalate before production rollout if the team cannot monitor latency, output drift, and failure rate after deployment.
More from peterhdd/agent-skills
engineering-senior-developer
Lead complex software implementation, architecture decisions, and reliable delivery across any modern technology stack. Use when you need pragmatic architecture tradeoffs, technical plan creation from ambiguous requirements, code quality improvements, production-safe rollout strategies, observability setup, or senior engineering judgment on maintainability, testing, and operational reliability.
72engineering-backend-architect
Architect scalable backend systems, database schemas, APIs, and cloud infrastructure for robust server-side applications. Use when you need microservice vs monolith decisions, database indexing strategies, API versioning, event-driven architecture, ETL pipelines, WebSocket streaming, data modeling, query optimization, or cloud-native service design with high reliability and sub-20ms query performance.
49engineering-frontend-developer
Build modern web applications with React, Vue, Angular, or Svelte, focusing on performance and accessibility. Use when you need component library development, TypeScript UI implementation, responsive layouts with CSS Grid and Flexbox, Core Web Vitals optimization, service worker offline support, code splitting, ARIA accessibility, Storybook integration, or frontend API client architecture.
48engineering-mobile-app-builder
Build native and cross-platform mobile applications for iOS and Android with optimized performance and platform integration. Use when you need SwiftUI or Jetpack Compose development, React Native or Flutter cross-platform apps, offline-first architecture, biometric authentication, push notifications, deep linking, app startup optimization, or mobile-specific UX patterns and gesture handling.
46engineering-system-designer
Design distributed systems, define architecture for scalability and reliability, or create system design documents. Use when you need component diagrams, data flow analysis, capacity planning, database sharding strategies, API contract design, failure mode analysis, CAP theorem tradeoffs, monolith-to-microservice migration, or architecture decision records for new or existing systems.
42engineering-rapid-prototyper
Build functional prototypes and MVPs at maximum speed to validate ideas through working software. Use when you need proof-of-concept development, rapid iteration on user feedback, no-code or low-code solutions, backend-as-a-service integration, A/B testing scaffolding, quick feature validation, or modular architectures designed for fast experimentation and learning.
41