geniml

Installation
SKILL.md

Geniml: Genomic Interval Machine Learning

Overview

Geniml is a Python library that bridges genomic interval biology and machine learning. It provides region2vec for learning dense vector representations of genomic regions from BED files, BEDSpace for nearest-neighbor search in embedding space, dataset classes for ML-ready genomic interval loading, and evaluation utilities for embedding quality. Geniml is designed for researchers who want to apply modern ML techniques to chromatin accessibility, histone modification, or other region-based genomic data.

When to Use

  • Learn dense embeddings of genomic regions from a collection of BED files to enable ML-based analysis (region2vec)
  • Cluster chromatin accessibility peaks or histone modification sites by embedding similarity
  • Search for genomic regions similar to a query region using approximate nearest-neighbor search (BEDSpace)
  • Build training datasets for ML models from BED-format genomic intervals with a PyTorch-compatible interface
  • Compare embedding quality across training runs or datasets using quantitative metrics
  • Integrate genomic region representations into custom neural network architectures
  • For basic BED file parsing and set operations without ML, use gtars or pysam-genomic-files instead

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026