bakta-genome-annotation

Installation
SKILL.md

Bakta Genome Annotation

Overview

Bakta is a command-line pipeline for rapid, standardized annotation of bacterial and archaeal genomes and plasmids. It combines Prodigal for CDS prediction, tRNAscan-SE/Aragorn/Barrnap/Infernal for non-coding RNA, PILER-CR/PILERCR for CRISPR detection, and a tiered DIAMOND/HMM search against a curated UniRef100 + IPS/UPS database to assign gene names, EC numbers, GO terms, and COG categories. Bakta produces NCBI-compatible outputs (GFF3, GenBank, EMBL, INSDC-formatted FASTA, plus a JSON summary and a circular Circos plot) for a typical 5 Mb genome in 5–15 minutes on 8 CPUs.

When to Use

  • Annotating bacterial or archaeal genome assemblies (Illumina, PacBio, Nanopore) with NCBI-compatible locus tags and product names
  • Annotating plasmids and other circular replicons separately with --plasmid and --complete flags
  • Producing JSON-structured annotation outputs that can be parsed without GenBank or GFF3 detours
  • Generating a publication-ready circular genome plot via the bundled bakta_plot command
  • Annotating MAGs (metagenome-assembled genomes) with --meta to disable Prodigal training
  • Use Prokka instead when you need viral/mitochondrial kingdoms or when you must reproduce a legacy Prokka pipeline exactly
  • Use PGAP instead when submitting to NCBI GenBank with full standards compliance
  • Use Bakta when you want faster runs, regularly updated UniRef-derived databases, AMRFinderPlus integration, and a JSON summary out of the box

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
4
GitHub Stars
152
First Seen
8 days ago