Prokka Genome Annotation

Overview

Prokka is a command-line pipeline for rapid annotation of prokaryotic genomes (bacteria, archaea, and viruses). It uses a tiered search strategy: protein-coding genes (CDS) are predicted with Prodigal and searched first against a genus-specific database, then RefSeq proteins, then Pfam/TIGRFAMs HMMs. Non-coding RNA genes (rRNA, tRNA, tmRNA) are identified with Barrnap, Aragorn, and Infernal. Prokka processes a single FASTA assembly in minutes and outputs a comprehensive annotation in GFF3, GenBank, FASTA, and tabular formats.

When to Use

Annotating a newly assembled bacterial or archaeal genome from Illumina, PacBio, or Nanopore assemblies
Getting functional protein annotations (CDS with product names, EC numbers, GO terms) from a draft or complete genome
Preparing annotation files for downstream comparative genomics (Roary pan-genome, OrthoFinder)
Annotating viral or phage genomes when kingdom-specific databases are important
Performing metagenome-assembled genome (MAG) annotation with the --metagenome flag
Parsing annotated outputs in Python with BioPython for downstream sequence or feature analysis
Use PGAP (NCBI Prokaryotic Genome Annotation Pipeline) instead when the goal is NCBI GenBank submission with standards compliance
Use Bakta instead for faster annotation with built-in NCBI-compatible outputs and a more regularly updated database

prokka-genome-annotation

Prokka Genome Annotation

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database