polars-bio
Originally fromk-dense-ai/claude-scientific-skills
Installation
SKILL.md
polars-bio
Overview
polars-bio is a high-performance Python library for genomic interval operations and bioinformatics file I/O, built on Polars, Apache Arrow, and Apache DataFusion. It provides a familiar DataFrame-centric API for interval arithmetic (overlap, nearest, merge, coverage, complement, subtract) and reading/writing common bioinformatics formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ).
Key value propositions:
- 6-38x faster than bioframe on real-world genomic benchmarks
- Streaming/out-of-core support for large genomes via DataFusion
- Cloud-native file I/O (S3, GCS, Azure) with predicate pushdown
- Two API styles: functional (
pb.overlap(df1, df2)) and method-chaining (df1.lazy().pb.overlap(df2)) - SQL interface for genomic data via DataFusion SQL engine