Bakta Genome Annotation

Overview

Bakta is a command-line pipeline for rapid, standardized annotation of bacterial and archaeal genomes and plasmids. It combines Prodigal for CDS prediction, tRNAscan-SE/Aragorn/Barrnap/Infernal for non-coding RNA, PILER-CR/PILERCR for CRISPR detection, and a tiered DIAMOND/HMM search against a curated UniRef100 + IPS/UPS database to assign gene names, EC numbers, GO terms, and COG categories. Bakta produces NCBI-compatible outputs (GFF3, GenBank, EMBL, INSDC-formatted FASTA, plus a JSON summary and a circular Circos plot) for a typical 5 Mb genome in 5–15 minutes on 8 CPUs.

When to Use

Annotating bacterial or archaeal genome assemblies (Illumina, PacBio, Nanopore) with NCBI-compatible locus tags and product names
Annotating plasmids and other circular replicons separately with --plasmid and --complete flags
Producing JSON-structured annotation outputs that can be parsed without GenBank or GFF3 detours
Generating a publication-ready circular genome plot via the bundled bakta_plot command
Annotating MAGs (metagenome-assembled genomes) with --meta to disable Prodigal training
Use Prokka instead when you need viral/mitochondrial kingdoms or when you must reproduce a legacy Prokka pipeline exactly
Use PGAP instead when submitting to NCBI GenBank with full standards compliance
Use Bakta when you want faster runs, regularly updated UniRef-derived databases, AMRFinderPlus integration, and a JSON summary out of the box

bakta-genome-annotation

Bakta Genome Annotation

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database