COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

Model Organism Genetics Pipeline

Map human genes to model organism orthologs and retrieve phenotype, expression, and functional data across six species. Synthesize cross-species evidence to assess gene function conservation and identify the best animal models for studying human genes and diseases.

Not for: human variant interpretation (tooluniverse-variant-analysis), drug target validation (tooluniverse-drug-target-validation), human disease characterization (tooluniverse-multiomic-disease-characterization).

LOOK UP, DON'T GUESS: When asked about a species' taxonomy, ecology, or biology, search GBIF/NCBI Taxonomy first. For GBIF: use GBIF_search_species(query="species name"), then use the nubKey (not key) from the result to call GBIF_get_species(speciesKey=nubKey) for full taxonomy (kingdom, phylum, class, order, family). The nubKey is the GBIF backbone key; the key is dataset-specific and often lacks higher taxonomy.

Reasoning Principles

Ortholog Reasoning

Sequence conservation across species implies functional conservation — but not always. A highly conserved gene in mouse and human likely has the same function. But regulatory differences (when/where a gene is expressed) can cause different phenotypes even from the same gene. Always check: is the protein domain conserved, or just raw sequence? Are there known regulatory differences? A 40% identity ortholog with a conserved catalytic domain can be more functionally equivalent than a 90% identity paralog in the same species.

Paralog contamination is a common pitfall. Gene families (e.g., FOXP1/2/3/4, HOX clusters) generate false ortholog hits. Distinguish true orthologs from paralogs by checking synteny (conserved gene neighborhood) and homology type: 1:1 = likely true ortholog; 1:many or many:many = likely paralog expansion. If the target species has a single gene where humans have multiple (e.g., one fly FoxP vs four human FOXPs), it is the co-ortholog of all human paralogs — note this explicitly.

tooluniverse-model-organism-genetics

COMPUTE, DON'T DESCRIBE

Model Organism Genetics Pipeline

Reasoning Principles

Ortholog Reasoning