designing-data-storage

Installation

SKILL.md

Designing Data Storage

Comprehensive guide to selecting and using modern data storage formats for analytics and machine learning. Covers six file formats (Parquet, Arrow, Lance, Zarr, Avro, ORC) and three lakehouse table formats (Delta Lake, Apache Iceberg, Apache Hudi).

Quick Format Comparison
- File Formats
- Lakehouse Table Formats
When to Use Which?
- File Formats
- Lakehouse Table Formats
Format Selection Matrix
Detailed Reference Guides
Code Examples

Related skills

More from legout/data-platform-agent-skills

data-science-eda
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
13
data-science-visualization
Data visualization for Python: Matplotlib, Seaborn, Plotly, Altair, hvPlot/HoloViz, and Bokeh. Use when creating exploratory charts, interactive dashboards, publication-quality figures, or choosing the right library for your data and audience.
12
data-engineering-core
Core Python data engineering: Polars, DuckDB, PyArrow, PostgreSQL, ETL patterns, performance tuning, and resilient pipeline construction. Use when building or reviewing batch ETL/dataframe/SQL pipelines in Python.
10
data-science-feature-engineering
Feature engineering for machine learning: encoding, scaling, transformations, datetime features, text features, and feature selection. Use when preparing data for modeling or improving model performance through better representations.
10
data-science-notebooks
Interactive notebooks for data science: Jupyter, JupyterLab, and marimo. Use for exploratory analysis, reproducible research, documentation, and sharing insights with stakeholders.
9
data-engineering-best-practices
Data engineering best practices: medallion architecture, dataset lifecycle, partitioning, file sizing, schema evolution, and append/overwrite/merge patterns across Polars, PyArrow, DuckDB, Delta Lake, and Iceberg. Use when designing production data pipelines or reviewing data platform decisions.
8

Installs

Repository

legout/data-pla…t-skills

First Seen

Mar 16, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

designing-data-storage

Designing Data Storage

Table of Contents

More from legout/data-platform-agent-skills

data-science-eda

data-science-visualization

data-engineering-core

data-science-feature-engineering

data-science-notebooks

data-engineering-best-practices