jgi-lakehouse

Installation
SKILL.md

JGI Lakehouse Skill

Instructions

  1. Decide whether the task needs metadata from the Lakehouse or files from the JGI filesystem.
  2. Start with a small validation query, then remove LIMIT for final counts or complete result sets.
  3. Use the query and download patterns below instead of improvising SQL or filesystem paths.
  4. Record the exact tables, filters, taxon OIDs, file paths, and commands used.

Quick Reference

Task Action
Find metadata Query the Lakehouse with SQL
Download IMG genome packages Copy {taxon_oid}.tar.gz from /clusterfs/jgi/img_merfs-ro/img_web/img_web_data/download/
Retrieve Mycocosm or Phytozome files Query portal.downloadRequestFiles, then copy from /global/dna/dm_archive/
Query metagenome proteins Use the NUMG tables and join on both oid and gene_oid
Inspect schemas Use SHOW TABLES and DESCRIBE before writing larger joins
Related skills
Installs
15
GitHub Stars
2
First Seen
Feb 19, 2026