data-lakehouse
data-lakehouse
Purpose
This skill enables the design and implementation of data lakehouse architectures, combining data lakes with warehouse features for scalable big data storage and analytics. Use it to manage petabyte-scale data with ACID transactions, schema evolution, and optimized query performance on platforms like Delta Lake or Iceberg.
When to Use
- When handling large-scale data ingestion from sources like S3 or Kafka, requiring both raw storage and structured querying.
- For analytics workloads needing real-time updates, such as ETL pipelines in e-commerce or IoT data processing.
- If you're integrating with Spark or Presto for SQL analytics on unstructured data.
Key Capabilities
- Architecture Design: Generate blueprints for lakehouse setups, including partitioning strategies and metadata management (e.g., using Iceberg for table formats).
- Data Ingestion: Support for batch and streaming ingestion with tools like Apache Spark, handling formats like Parquet or ORC.
- Query Optimization: Implement caching and indexing for faster queries, such as creating Delta Lake tables with Z-order clustering.
- Scalability: Auto-scale storage and compute resources via cloud APIs, e.g., AWS Glue for ETL jobs.
- Security: Enforce row-level access controls using policies like AWS Lake Formation grants.
More from alphaonedev/openclaw-graph
playwright-scraper
Playwright web scraping: dynamic content, auth flows, pagination, data extraction, screenshots
1.4Kgcp-iam
Manages identity and access control for Google Cloud resources using IAM policies and roles.
370humanize-ai-text
AI text humanization: reduce AI-detection patterns, natural phrasing, tone adjustment
260macos-automation
AppleScript, JXA, Shortcuts, Automator, osascript, System Events, accessibility API
173tavily-web-search
Tavily: web search optimized for AI agents, answer synthesis, domain filtering, depth control
154clawflows
OpenClaw workflow automation: multi-step task chains, conditional logic, triggers, schedule
102