prokka-genome-annotation

Installation
SKILL.md

Prokka Genome Annotation

Overview

Prokka is a command-line pipeline for rapid annotation of prokaryotic genomes (bacteria, archaea, and viruses). It uses a tiered search strategy: protein-coding genes (CDS) are predicted with Prodigal and searched first against a genus-specific database, then RefSeq proteins, then Pfam/TIGRFAMs HMMs. Non-coding RNA genes (rRNA, tRNA, tmRNA) are identified with Barrnap, Aragorn, and Infernal. Prokka processes a single FASTA assembly in minutes and outputs a comprehensive annotation in GFF3, GenBank, FASTA, and tabular formats.

When to Use

  • Annotating a newly assembled bacterial or archaeal genome from Illumina, PacBio, or Nanopore assemblies
  • Getting functional protein annotations (CDS with product names, EC numbers, GO terms) from a draft or complete genome
  • Preparing annotation files for downstream comparative genomics (Roary pan-genome, OrthoFinder)
  • Annotating viral or phage genomes when kingdom-specific databases are important
  • Performing metagenome-assembled genome (MAG) annotation with the --metagenome flag
  • Parsing annotated outputs in Python with BioPython for downstream sequence or feature analysis
  • Use PGAP (NCBI Prokaryotic Genome Annotation Pipeline) instead when the goal is NCBI GenBank submission with standards compliance
  • Use Bakta instead for faster annotation with built-in NCBI-compatible outputs and a more regularly updated database

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026