data-science-feature-engineering

Installation

SKILL.md

Feature Engineering

Use this skill for creating, transforming, and selecting features that improve model performance.

When to use this skill

After EDA — convert insights into features
Model underperforming — need better representations
Handling different data types (numerical, categorical, text, datetime)
Reducing dimensionality or selecting most predictive features

Feature engineering workflow

Numerical features
- Scaling (StandardScaler, MinMaxScaler, RobustScaler)
- Transformations (log, sqrt, Box-Cox for skewness)
- Binning (equal-width, quantile, custom)
- Interaction features

Related skills

More from legout/data-platform-agent-skills

data-science-eda
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
13
data-science-visualization
Data visualization for Python: Matplotlib, Seaborn, Plotly, Altair, hvPlot/HoloViz, and Bokeh. Use when creating exploratory charts, interactive dashboards, publication-quality figures, or choosing the right library for your data and audience.
12
data-engineering-core
Core Python data engineering: Polars, DuckDB, PyArrow, PostgreSQL, ETL patterns, performance tuning, and resilient pipeline construction. Use when building or reviewing batch ETL/dataframe/SQL pipelines in Python.
10
data-science-notebooks
Interactive notebooks for data science: Jupyter, JupyterLab, and marimo. Use for exploratory analysis, reproducible research, documentation, and sharing insights with stakeholders.
9
data-engineering-best-practices
Data engineering best practices: medallion architecture, dataset lifecycle, partitioning, file sizing, schema evolution, and append/overwrite/merge patterns across Polars, PyArrow, DuckDB, Delta Lake, and Iceberg. Use when designing production data pipelines or reviewing data platform decisions.
8
data-engineering-storage-formats
Modern data serialization formats: Parquet, Apache Arrow (Feather/IPC), Lance (ML-native), Zarr (chunked arrays), Avro, and ORC. Covers compression, partitioning, and format selection.
8

Installs

Repository

legout/data-pla…t-skills

First Seen

Feb 11, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykPass

data-science-feature-engineering

Feature Engineering

When to use this skill

Feature engineering workflow

More from legout/data-platform-agent-skills

data-science-eda

data-science-visualization

data-engineering-core

data-science-notebooks

data-engineering-best-practices

data-engineering-storage-formats