data-lakehouse

Purpose

This skill enables the design and implementation of data lakehouse architectures, combining data lakes with warehouse features for scalable big data storage and analytics. Use it to manage petabyte-scale data with ACID transactions, schema evolution, and optimized query performance on platforms like Delta Lake or Iceberg.

When to Use

When handling large-scale data ingestion from sources like S3 or Kafka, requiring both raw storage and structured querying.
For analytics workloads needing real-time updates, such as ETL pipelines in e-commerce or IoT data processing.
If you're integrating with Spark or Presto for SQL analytics on unstructured data.

Key Capabilities

Architecture Design: Generate blueprints for lakehouse setups, including partitioning strategies and metadata management (e.g., using Iceberg for table formats).
Data Ingestion: Support for batch and streaming ingestion with tools like Apache Spark, handling formats like Parquet or ORC.
Query Optimization: Implement caching and indexing for faster queries, such as creating Delta Lake tables with Z-order clustering.
Scalability: Auto-scale storage and compute resources via cloud APIs, e.g., AWS Glue for ETL jobs.
Security: Enforce row-level access controls using policies like AWS Lake Formation grants.

Related skills

More from alphaonedev/openclaw-graph

Installs

Repository

alphaonedev/ope…aw-graph

GitHub Stars

First Seen

Mar 5, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

data-lakehouse

data-lakehouse

Purpose

When to Use

Key Capabilities

More from alphaonedev/openclaw-graph

playwright-scraper

gcp-iam

humanize-ai-text

macos-automation

tavily-web-search

clawflows