data-engineering-storage-remote-access
Installation
SKILL.md
Remote Storage Access
Comprehensive guide to accessing cloud storage (S3, GCS, Azure) and remote filesystems in Python. Covers three major libraries - fsspec, pyarrow.fs, and obstore - and their integration with data engineering tools.
Quick Comparison
| Feature | fsspec | pyarrow.fs | obstore |
|---|---|---|---|
| Best For | Broad compatibility, ecosystem integration | Arrow-native workflows, Parquet | High-throughput, performance-critical |
| Backends | S3, GCS, Azure, HTTP, FTP, 20+ more | S3, GCS, HDFS, local | S3, GCS, Azure, local |
| Performance | Good (with caching) | Excellent for Parquet | 9x faster for concurrent ops |
| Dependencies | Backend-specific (s3fs, gcsfs) | Bundled with PyArrow | Zero Python deps (Rust) |
| Async Support | Yes (aiohttp) | Limited | Native sync/async |
| DataFrame Integration | Universal | PyArrow-native | Via fsspec wrapper |
| Maturity | Very mature (2018+) | Mature | New (2025), rapidly evolving |