transformers-bio-nlp

Installation
SKILL.md

Transformers for Biomedical NLP

Overview

HuggingFace Transformers provides a unified API to load, run, and fine-tune 500+ biomedical language models. The key biomedical models — BioBERT (trained on PubMed abstracts + PMC full text), PubMedBERT (trained from scratch on PubMed), BioGPT (generative, trained on PubMed), and BioMedLM — significantly outperform general-purpose BERT on biomedical NER, relation extraction, and question answering. The pipeline() abstraction handles tokenization, inference, and postprocessing in one call. Fine-tuning on task-specific labeled data (e.g., BC5CDR for chemical/disease NER) takes under an hour on a single GPU. The datasets library provides direct access to standard biomedical benchmarks.

When to Use

  • Extracting gene names, disease mentions, drug names, or chemical entities from biomedical abstracts (NER)
  • Classifying abstracts by topic, sentiment of clinical outcomes, or PICO elements for systematic reviews
  • Answering specific questions from biomedical literature using extractive QA (BioASQ format)
  • Generating hypotheses or summaries from biomedical text using BioGPT or BioMedLM
  • Fine-tuning a pre-trained biomedical model on a custom labeled dataset (e.g., your lab's annotations)
  • Embedding biomedical sentences for semantic similarity search across literature
  • Use spaCy + en_core_sci_lg for fast rule-augmented NER; use Stanza for dependency parsing

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026