ai-data-integration
AI Data Integration Skill for Claude
Expert guidance for integrating AI/LLM capabilities with data engineering systems. Covers MCP server patterns for warehouses, NL-to-SQL generation, LLM-powered data transformations, and embeddings for data discovery. Security tiers are deeply integrated -- not bolted on.
Not an ML/MLOps skill. Covers how AI agents interact with data platforms, not model training or deployment.
When to Use
Activate when: building MCP servers for warehouse data, implementing NL-to-SQL interfaces, using LLMs for data enrichment/classification/extraction in pipelines, building embeddings for catalog search or semantic matching, designing AI agent access with security guardrails, evaluating when AI adds value vs traditional approaches.
Don't use for: ML model training/experiment tracking, general prompt engineering, chatbots/conversational AI, dbt/DLT/orchestration (use domain skills), Python DataFrame transforms without LLM involvement (python-data-engineering).
Scope Constraints
- This skill generates integration patterns, not production-ready applications.
- All code examples assume read-only warehouse access unless explicitly stated.
- Security guidance covers AI-specific concerns only -- see Security & Compliance Patterns for the full framework.
- Cost estimates are approximate and vary by provider and model version.
More from dtsong/data-engineering-skills
data-observability
Use this skill when implementing monitoring, alerting, and incident response for data pipelines. Covers freshness monitoring, volume anomaly detection, schema change detection, alerting patterns, and incident response workflows. Common phrases: \"data freshness\", \"pipeline monitoring\", \"data anomaly\", \"schema drift\", \"data alerting\", \"incident response\", \"data observability\", \"stale data\". Do NOT use for writing dbt models (use dbt-transforms), pipeline scheduling (use data-pipelines), or data quality testing as deliverables (use data-testing).
3duckdb
Use this skill when working with DuckDB for local data analysis, file ingestion, or data exploration. Covers reading CSV/Excel/Parquet/JSON files into DuckDB, SQL analytics on local data, data profiling, cleaning transformations, and export to various formats. Common phrases: \"analyze this CSV\", \"DuckDB query\", \"local data analysis\", \"read Excel in SQL\", \"profile this data\". Do NOT use for dbt model building (use dbt-transforms with DuckDB adapter) or cloud warehouse administration.
2data-governance
Use this skill when implementing data governance as part of engineering work. Covers data cataloging (dbt docs, external tools), lineage documentation, data classification (PII/PHI taxonomy), access control patterns (RBAC, row-level security), and compliance frameworks (GDPR, HIPAA, SOX, CCPA). Common phrases: \"data catalog\", \"data lineage\", \"PII classification\", \"access control\", \"RBAC\", \"data governance\", \"compliance requirements\". Do NOT use for writing dbt models (use dbt-transforms), pipeline orchestration (use data-pipelines), or data quality testing (use data-testing).
2dlt-extract
Use this skill when building DLT pipelines for file-based or consulting data extraction. Covers Excel/CSV/SharePoint ingestion via DLT, destination swapping (DuckDB dev to warehouse prod), schema contracts for cleaning, and portable pipeline patterns. Common phrases: \"dlt pipeline for files\", \"extract Excel with dlt\", \"portable data pipeline\", \"dlt filesystem source\". Do NOT use for core DLT concepts like REST API or SQL database sources (use data-integration) or pipeline scheduling (use data-pipelines).
2data-testing
Use this skill when designing testing strategies for data pipelines, writing SQL assertions, validating pipeline output, or packaging tests as client deliverables. Covers dbt test patterns, pipeline validation, SQL assertion libraries, test coverage targets, and test-as-deliverable packaging. Common phrases: \"data testing strategy\", \"pipeline validation\", \"SQL assertions\", \"test coverage\", \"test as deliverable\", \"data quality tests\". Do NOT use for writing dbt models (use dbt-transforms), DuckDB analytical queries (use duckdb), or pipeline scheduling (use data-pipelines).
2event-streaming
Use this skill when building real-time or near-real-time data pipelines. Covers Kafka, Flink, Spark Streaming, Snowpipe, BigQuery streaming, materialized views, and batch-vs-streaming decisions. Common phrases: \"real-time pipeline\", \"Kafka consumer\", \"streaming vs batch\", \"low latency ingestion\". Do NOT use for batch integration patterns (use data-integration) or pipeline orchestration (use data-pipelines).
2