Plant Genomics & Biology

Pipeline for investigating plant genes, metabolic pathways, species taxonomy, and comparative plant biology using ToolUniverse tools.

Reasoning Strategy

Plant genomes are large (wheat is ~17 Gb, vs. 3 Gb for human) and often polyploid — wheat is hexaploid (AABBDD), meaning there are three homeologous copies of most genes. When comparing plant genes to Arabidopsis, always account for whole-genome duplications: a single Arabidopsis gene may have 2–4 paralogs in a crop species, all potentially with diverged functions. Gene families are massively expanded in plants relative to animals (e.g., receptor-like kinases, cytochrome P450s, transcription factors) — a BLAST hit does not mean functional equivalence. Arabidopsis thaliana is the primary model, but its small genome and rapid life cycle mean some features (wood formation, nitrogen fixation symbiosis, C4 photosynthesis) are absent and must be studied in other species.

LOOK UP DON'T GUESS: Do not assume gene function by sequence similarity alone in polyploid species; look up functional validation evidence via UniProt (reviewed entries) or PlantReactome. Do not assume KEGG organism codes — use the table or query kegg_search_pathway with the species name to confirm availability.

Key principles:

Plant-specific pathways — photosynthesis, secondary metabolism, hormone signaling are unique to plants
PlantReactome as foundation — curated plant pathway database with cross-species coverage (Oryza, Arabidopsis, Zea mays, etc.)
Ensembl Plants for genomics — use Ensembl with plant species names for gene lookup and annotation
KEGG for metabolism — KEGG has plant-specific organism codes (ath=Arabidopsis, osa=rice, zma=maize)
Evidence grading — T1: functional validation (mutant phenotype), T2: expression/localization data, T3: ortholog-based prediction, T4: computational annotation only

tooluniverse-plant-genomics

Plant Genomics & Biology

Reasoning Strategy