plannotate-plasmid-annotation
pLannotate Plasmid Annotation
Overview
pLannotate annotates plasmid sequences by running BLAST searches against a curated library of over 5,000 features sourced from Addgene, NCBI, and fpbase. It identifies promoters, terminators, antibiotic resistance genes, origins of replication, tags, and fluorescent proteins while correctly handling circular plasmid topology — avoiding split-feature artifacts that arise from naive linear alignment. Results are written as annotated GenBank files for downstream use in SnapGene, Benchling, or BioPython, as interactive HTML plasmid maps for sharing and review, and as CSV tables for programmatic filtering. Both a Python API and a command-line interface are provided; a Streamlit web app is also bundled for exploratory use.
When to Use
- Annotating a plasmid sequence received from a collaborator or downloaded from Addgene with no accompanying map
- Verifying that all expected elements (promoter, insert, resistance marker, origin) are present after assembly or mutagenesis
- Preparing a GenBank submission or Addgene deposit that requires a complete feature table
- Batch-annotating a library of synthetic constructs produced by combinatorial cloning
- Generating a shareable interactive plasmid map (HTML) without requiring SnapGene or Benchling licenses
- Checking a de-novo synthesized gene block for unintended regulatory elements or cryptic ORFs before cloning
- Use SnapGene or Benchling instead when you need a full-featured GUI plasmid editor with primer design and cloning simulation workflows; pLannotate is best for automated, scriptable annotation
- Use Prokka instead when annotating a complete bacterial genome or a large linear chromosomal sequence; pLannotate is optimized for plasmid-sized sequences up to ~50 kb
Prerequisites
More from jaechang-hits/sciagent-skills
scientific-brainstorming
Structured ideation methods: SCAMPER, Six Thinking Hats, Morphological Analysis, TRIZ, Biomimicry, plus more. Decision framework for picking methods by challenge type (stuck, improving, systematic exploration, contradiction). Use when generating research ideas or exploring interdisciplinary connections.
12snakemake-workflow-engine
Python-based workflow management system for reproducible, scalable pipelines. Define rules with file-based dependencies; Snakemake automatically determines the execution order and parallelism. Supports local, SLURM, LSF, AWS, and Google Cloud execution via profiles; per-rule conda/Singularity environments. Use for bioinformatics NGS pipelines, ML training workflows, and any multi-step file-processing analysis. Use Nextflow instead for Groovy-based dataflow pipelines or when nf-core ecosystem integration is required.
11esm-protein-language-model
Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and protein embeddings. Use when designing novel proteins, extracting sequence representations for downstream ML, or predicting structure from sequence. Local GPU or EvolutionaryScale Forge cloud API. For traditional structure prediction use AlphaFold; for small-molecule cheminformatics use RDKit.
11biopython-sequence-analysis
Biopython sequence analysis: parse FASTA/FASTQ/GenBank/GFF (SeqIO), NCBI Entrez (esearch/efetch/elink), remote/local BLAST, pairwise/MSA alignment (PairwiseAligner, MUSCLE/ClustalW), phylogenetic trees (Phylo). Use for gene family studies, phylogenomics, comparative genomics, NCBI pipelines. For PCR/restriction/cloning use biopython-molecular-biology; for SAM/BAM use pysam.
11shap-model-explainability
>-
11archs4-database
Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).
11