vaex-dataframes

Installation
SKILL.md

Vaex DataFrames

Overview

Vaex is a high-performance Python library for lazy, out-of-core DataFrame operations on datasets too large to fit in RAM. It processes over a billion rows per second using memory-mapped files and lazy evaluation, enabling interactive exploration and analysis without loading data into memory.

When to Use

  • Processing tabular datasets larger than available RAM (10 GB to terabytes)
  • Fast statistical aggregations on massive datasets (mean, std, quantiles at billion-row scale)
  • Creating visualizations (heatmaps, histograms) of large datasets without sampling
  • Building ML preprocessing pipelines (scaling, encoding, PCA) on big data
  • Converting between data formats (CSV to HDF5/Arrow for fast repeated access)
  • Feature engineering with virtual columns that consume zero additional memory
  • Working with astronomical catalogs, financial time series, or large scientific datasets
  • For in-memory speed on data that fits in RAM, use polars instead
  • For distributed multi-node computing, use dask instead

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026