Pandas

Overview

Pandas is a Python library for loading, cleaning, transforming, and analyzing tabular data. It provides DataFrames for structured data manipulation, supports CSV, Excel, SQL, JSON, and Parquet formats, and offers powerful groupby aggregation, merge/join operations, time series resampling, and method chaining for building analysis pipelines.

Instructions

When loading data, use pd.read_parquet() for large datasets (faster, smaller, type-preserving), pd.read_csv() with explicit dtype for CSVs, and pd.read_sql() for database queries.
When cleaning data, handle missing values with fillna() or dropna(), deduplicate with drop_duplicates(), use string methods (.str.strip(), .str.lower()) for text cleaning, and convert types explicitly with astype() and pd.to_datetime().
When transforming data, use assign() for computed columns, pipe() for method chaining, melt() and pivot_table() for reshaping, and pd.cut()/pd.qcut() for binning.
When aggregating, use groupby().agg() with named aggregation for readable column names, transform() to broadcast results back to original shape, and resample() for time-based grouping.
When merging, use pd.merge() with explicit how and validate parameters to catch data quality issues at merge time, and pd.concat() for stacking DataFrames.
When optimizing performance, use category dtype for low-cardinality strings, vectorized operations over .apply(), and Parquet for storage; for datasets over 10GB, consider Polars or DuckDB.

pandas

Pandas

Overview

Instructions

Examples

Example 1: Clean and analyze a sales dataset

More from terminalskills/skills

api-tester

instagram-marketing

directus

coolify

agent-memory

reddit-insights