jgi-lakehouse
Installation
SKILL.md
JGI Lakehouse Skill
Instructions
- Decide whether the task needs metadata from the Lakehouse or files from the JGI filesystem.
- Start with a small validation query, then remove
LIMITfor final counts or complete result sets. - Use the query and download patterns below instead of improvising SQL or filesystem paths.
- Record the exact tables, filters, taxon OIDs, file paths, and commands used.
Quick Reference
| Task | Action |
|---|---|
| Find metadata | Query the Lakehouse with SQL |
| Download IMG genome packages | Copy {taxon_oid}.tar.gz from /clusterfs/jgi/img_merfs-ro/img_web/img_web_data/download/ |
| Retrieve Mycocosm or Phytozome files | Query portal.downloadRequestFiles, then copy from /global/dna/dm_archive/ |
| Query metagenome proteins | Use the NUMG tables and join on both oid and gene_oid |
| Inspect schemas | Use SHOW TABLES and DESCRIBE before writing larger joins |
Related skills
More from fmschulz/omics-skills
beautiful-data-viz
Create publication-quality matplotlib/seaborn charts with readable axes, tight layout, and curated palettes.
19bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
19bio-protein-clustering-pangenome
Cluster proteins into orthogroups and derive pangenome matrices.
18plotly-dashboard-skill
Build production-ready Plotly Dash dashboards with consistent theming, clear layouts, and performant callbacks.
18bio-annotation
Functional annotation and taxonomy inference from sequence homology.
17bio-foundation-housekeeping
Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.
16