tooluniverse-epigenomics

Installation
SKILL.md

Genomics and Epigenomics Data Processing

⚠️ TOP-OF-MIND RULE: long-format methylation CSV — count ROWS, not unique positions

When the input is a long-format methylation CSV (one row per (sample, CpG_position) e.g. columns Pos, Chromosome, MethylationPercentage), "how many sites are removed when filtering" almost always means rows removed, NOT unique-position removals. The two answers differ by a factor of ≈ n_samples.

Question phrasing What it means
"how many sites are removed when filtering …" rows removed (= samples × positions failing the filter)
"how many unique CpG sites pass filter" unique positions (dedupe by Pos then filter)

❌ WRONG: df.drop_duplicates(["Pos"]).query("MethylationPercentage<10 or >90") then len(filtered) → counts unique positions (typically 100–1500)

✅ RIGHT: df.query("MethylationPercentage<10 or MethylationPercentage>90") then len(df) - len(filtered) → counts rows (typically 10k–30k)

Installs
291
GitHub Stars
1.4K
First Seen
Feb 16, 2026
tooluniverse-epigenomics — mims-harvard/tooluniverse