Dask

Overview

Dask is a Python library for parallel and distributed computing that enables three critical capabilities:

Larger-than-memory execution on single machines for data exceeding available RAM
Parallel processing for improved computational speed across multiple cores
Distributed computation supporting terabyte-scale datasets across multiple machines

Dask scales from laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs.

Current upstream: dask 2026.3.0 (PyPI, March 2026). Docs: docs.dask.org. Since 2025.1.0, the expression-based DataFrame API with query planning is the only implementation — do not install dask-expr separately or set dataframe.query-planning: False.

dask

Dask

Overview

Quick Start

Installation