data-engineering-storage-remote-access

Installation
SKILL.md

Remote Storage Access

Comprehensive guide to accessing cloud storage (S3, GCS, Azure) and remote filesystems in Python. Covers three major libraries - fsspec, pyarrow.fs, and obstore - and their integration with data engineering tools.

Quick Comparison

Feature fsspec pyarrow.fs obstore
Best For Broad compatibility, ecosystem integration Arrow-native workflows, Parquet High-throughput, performance-critical
Backends S3, GCS, Azure, HTTP, FTP, 20+ more S3, GCS, HDFS, local S3, GCS, Azure, local
Performance Good (with caching) Excellent for Parquet 9x faster for concurrent ops
Dependencies Backend-specific (s3fs, gcsfs) Bundled with PyArrow Zero Python deps (Rust)
Async Support Yes (aiohttp) Limited Native sync/async
DataFrame Integration Universal PyArrow-native Via fsspec wrapper
Maturity Very mature (2018+) Mature New (2025), rapidly evolving

When to Use Which?

Installs
6
First Seen
Feb 11, 2026
data-engineering-storage-remote-access — legout/data-platform-agent-skills