huggingface-tokenizers

Originally fromovachiever/droid-tings

Installation

SKILL.md

HuggingFace Tokenizers - Fast Tokenization for NLP

Fast, production-ready tokenizers with Rust performance and Python ease-of-use.

When to use HuggingFace Tokenizers

Use HuggingFace Tokenizers when:

Need extremely fast tokenization (<20s per GB of text)
Training custom tokenizers from scratch
Want alignment tracking (token → original text position)
Building production NLP pipelines
Need to tokenize large corpora efficiently

Performance:

Speed: <20 seconds to tokenize 1GB on CPU
Implementation: Rust core with Python/Node.js bindings
Efficiency: 10-100× faster than pure Python implementations

Installs

361

Repository

orchestra-resea…h-skills

GitHub Stars

10.5K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass

huggingface-tokenizers — orchestra-research/ai-research-skills