data-engineering-ai-ml
Installation
SKILL.md
AI/ML Data Pipelines
Data engineering patterns for AI/ML workloads: embedding generation, vector databases, retrieval-augmented generation (RAG), LLM output monitoring, and batch inference. Covers LanceDB, pgvector, and OpenAI integrations.
When to Use These Patterns?
- RAG Applications: Building chatbots, semantic search, question-answering
- LLM Monitoring: Tracking token usage, latency, output quality
- Embedding Pipelines: Generating and storing vector embeddings for ML models
- Batch Inference: Large-scale model inference pipelines
- Feature Stores: Versioned feature data for ML training/serving
Skill Dependencies
@data-engineering-core- Polars, DuckDB for data processing@data-engineering-storage-remote-access- Cloud storage for embeddings and models@data-engineering-orchestration- Schedule/batch embedding generation@data-engineering-quality- Validate embedding quality