polars-bio

Installation
SKILL.md

polars-bio

Overview

polars-bio is a high-performance Python library for genomic interval operations and bioinformatics file I/O, built on Polars, Apache Arrow, and Apache DataFusion. It provides a familiar DataFrame-centric API for interval arithmetic (overlap, nearest, merge, coverage, complement, subtract) and reading/writing common bioinformatics formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ).

Key value propositions:

  • 6-38x faster than bioframe on real-world genomic benchmarks
  • Streaming/out-of-core support for large genomes via DataFusion
  • Cloud-native file I/O (S3, GCS, Azure) with predicate pushdown
  • Two API styles: functional (pb.overlap(df1, df2)) and method-chaining (df1.lazy().pb.overlap(df2))
  • SQL interface for genomic data via DataFusion SQL engine

When to Use This Skill

Use this skill when:

  • Performing genomic interval operations (overlap, nearest, merge, coverage, complement, subtract)
  • Reading/writing bioinformatics file formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ)
  • Processing large genomic datasets that don't fit in memory (streaming mode)
Related skills

More from k-dense-ai/scientific-agent-skills

Installs
208
GitHub Stars
20.8K
First Seen
Apr 9, 2026