gtars

Installation
SKILL.md

GTARS: Fast Genomic Token Arithmetic and BED File Processing

Overview

GTARS is a Python library with a Rust-backed core for high-performance genomic interval operations. It provides BED file I/O, set-theoretic interval operations (intersection, union, merge, complement, subtract), genomic region tokenization against a reference universe, and utilities for building consensus universe BED files. GTARS is designed for workflows that process hundreds to thousands of BED files efficiently, serving as a preprocessing engine for ML pipelines (including geniml) and general bioinformatics pipelines.

When to Use

  • Read and write large BED files efficiently, leveraging Rust-backed parsing for speed over pure Python alternatives
  • Compute genomic interval intersections, merges, complements, or subtracts between BED file pairs or sets
  • Tokenize a collection of genomic regions against a fixed universe vocabulary for ML input preparation
  • Build consensus universe BED files from a collection of sample BED files
  • Count overlap statistics between two BED files without launching bedtools processes
  • Preprocess ATAC-seq, ChIP-seq, or ENCODE peak files before feeding into geniml or other ML tools
  • For full BED/BAM/SAM reading with CIGAR-level detail, use pysam-genomic-files instead

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026