etl-pipelines
etl-pipelines
Purpose
This skill enables OpenClaw to design and implement ETL pipelines for data extraction, transformation, and loading in data engineering workflows. It focuses on handling structured data sources like databases, files, and APIs, ensuring efficient data flow for analytics and reporting.
When to Use
Use this skill when building data pipelines for batch processing, real-time data ingestion, or data migration. Apply it in scenarios involving large datasets (e.g., >1TB), integrating with tools like Apache Spark or AWS Glue, or automating ETL for BI dashboards.
Key Capabilities
- Extract data from sources like CSV, JSON files, SQL databases, or APIs using connectors (e.g., JDBC for databases).
- Transform data with operations such as filtering, aggregation, or SQL queries (e.g., via Pandas or Spark DataFrames).
- Load data into targets like PostgreSQL, BigQuery, or S3 buckets with schema validation and error logging.
- Support for scheduling pipelines with cron-like expressions or integration with orchestration tools like Airflow.
- Handle incremental loads by tracking last processed timestamps or change data capture (CDC).
More from alphaonedev/openclaw-graph
playwright-scraper
Playwright web scraping: dynamic content, auth flows, pagination, data extraction, screenshots
1.4Kgcp-iam
Manages identity and access control for Google Cloud resources using IAM policies and roles.
370humanize-ai-text
AI text humanization: reduce AI-detection patterns, natural phrasing, tone adjustment
260macos-automation
AppleScript, JXA, Shortcuts, Automator, osascript, System Events, accessibility API
173tavily-web-search
Tavily: web search optimized for AI agents, answer synthesis, domain filtering, depth control
154clawflows
OpenClaw workflow automation: multi-step task chains, conditional logic, triggers, schedule
102